Skip to main content
§ Work

Computer-vision QA for field-installation operations.

On-device CV models that run in the technician's hand, paired with large-scale registration databases, integrated into the existing field workflow rather than bolted on top of it. Anonymized engagement pattern.

by Bogdan#computer-vision#field-operations#production#case-study

The brief, in shape: a field-operations business — utilities, telecom, energy, last-mile networks — runs thousands of installations or repairs per week. Technicians do the work in trucks, on poles, in basements, in attics. Each job has a spec: parts to install, tolerances to meet, photos to capture as evidence. Quality assurance is the bottleneck.

Without computer vision, the QA loop is one of two things. Either nobody reviews most photos, and bad installs slip into the field — with the cost paid later in callbacks, leaks, outages, refunds — or a back-office team reviews photos hours or days later, by which point the technician is two jobs away and the rework is expensive.

The intervention is a CV layer that runs in the technician's hand, gives feedback before they leave the site, and registers every photo against the right install in a database that survives the audit a year later.

This is anonymized — the underlying engagements are NDA — but the pattern is consistent.

What CV in the field actually means

The system has four jobs, in order of operational importance:

  1. Catch the obvious failures before the technician leaves. "This photo is too dark to verify the connection." "The serial-number label is not in frame." "This appears to be a 12mm fitting; the spec calls for 15mm."
  2. Reduce the back-office QA load. Anything the model is confident is correct doesn't need a human reviewing it. Anything ambiguous gets routed to a human with the model's reasoning attached.
  3. Register every photo with its full operational context — install ID, customer, step in the workflow, GPS, timestamp, model version that judged it — so the audit trail is real.
  4. Surface patterns. Which technicians are getting flagged the most? Which install steps are the highest-failure? Which equipment lots are showing anomalies?

Each of these has a different engineering shape. The first two are model-and-inference. The third is data plumbing. The fourth is analytics on top of the photo log.

Training data, the long pole

The single largest cost in the project is labeled training data. Field photos are noisy: bad lighting, motion blur, weird angles, partially-obscured parts, snow, dust, technician fingers in frame. "This object is a coupling" is easy in a clean studio shot and brutal in a real one.

What we build:

  • A labeling pipeline that prioritizes edge cases — the ambiguous photos that would be hardest for the model — so labeler time goes where the model is weakest.
  • A workflow that lets the field-ops business's own subject-matter experts label, with calibration tasks so we know which labelers agree with the spec and which don't.
  • Active learning: as the model improves, low-confidence photos get re-routed back into the labeling queue, so the dataset converges on the model's actual failure modes rather than on whatever the first batch happened to contain.

Versioned datasets, versioned label schemas, versioned model artifacts. Every prediction in production traces back to the dataset version it was trained on.

Model architecture, on-device

The model has to run on the technicians' hardware — a mix of mid-tier Android phones, ruggedized handhelds, and the occasional ageing iPad — without a network connection. Cloud inference is not an option for the live path; the work happens in basements and on rural roads.

We typically end up with a two-stage shape:

  • A small object-detection model (YOLO-family or comparable lightweight detector) that locates the relevant parts of the photo.
  • A classifier or regression head on each detection that answers the QA question — right part / wrong part, in spec / out of spec, label readable / not readable.

Quantized to int8, packaged through ONNX Runtime or the platform-native equivalent (Core ML on iOS, TFLite on Android), the inference budget is small enough to give feedback in a second or two on a five-year-old phone. The model itself is a few tens of megabytes; updates ship through the field-ops app's existing release pipeline, with version tracking so we know exactly which model rendered each verdict.

Registration against a large database

Every photo gets associated with the install record, the customer, the workflow step, GPS, timestamp, technician ID, equipment serial numbers where readable, and the model version + confidence that produced its verdict. This metadata is the difference between "the system caught a defect" and "we can prove, six months later, exactly which install, which technician, and which part was involved."

The database is operational, not analytical. It serves the live workflow first — the technician's app needs the install's photo history fast — and the analytics queries run off a derived warehouse, so heavy reporting doesn't impact live response times.

Field-tech workflow integration

This is the part most CV-in-the-field projects get wrong. Technicians have thirty seconds per photo, not ten minutes. The CV layer cannot be slow, condescending, or wrong-feeling. If it cries wolf on legitimate work, technicians stop trusting it within a shift, and within a week they're tapping past every prompt.

What we do instead:

  • Default to silence. The CV layer only intervenes when it has high confidence something is wrong.
  • When it does intervene, it shows the technician exactly what it saw — the bounding box, the inference, a brief phrase — so the technician can either correct it (retake the photo) or override it, with the override reason captured for back-office review.
  • Overrides are first-class citizens. If a technician overrides the model and is right, that's a labeled training example. If the model is overridden too often in the wrong direction, the model gets retrained.

The field-ops business's existing app and existing workflow are the host. We integrate; we don't replace.

The pattern

Every field-CV engagement has the same shape: an existing operation that runs at scale, a QA bottleneck that hurts both customer experience and unit economics, and a CV layer that intervenes in the technician's hand rather than in some back-office screen. The model itself is one of the cheaper components. The expensive parts are the training-data pipeline, the database registration, and the workflow integration.

The technicians keep doing the work. The work just gets a real-time second pair of eyes that doesn't sleep.