June 2026

I Had 1,357 Vanity Plate Photos. So I Built an AI Data Pipeline.

Most organizations don't have a data problem. They have an unstructured data problem.

AI PipelineClaude VisionBatch APIPortfolio

A folder of photos is not a dataset

I have a habit. When I see an interesting license plate, I take a photo of it. Over a few years, that habit produced 1,412photos in a “Vanity Plates” iPhone album. Totally unstructured. No metadata beyond EXIF timestamps. I knew there was something interesting in there — I just couldn't query it.

That's a pretty typical unstructured data problem. Photos, scanned PDFs, handwritten forms, camera-roll receipts. The information is real. The signal is there. But it's trapped in a format no spreadsheet can touch.

This is what I built to get it out — and it's the same architecture I use for client work, just applied to something I could share publicly without confidentiality concerns.

The architecture: four steps

Triage with osxphotos

osxphotos queries the Photos.app library directly — no export, no duplication. It pulls every photo from the "Vanity Plates" album plus anything Apple Vision labeled as a license plate, dedupes by UUID, and writes a candidate list with capture timestamps. No GPS ever leaves the laptop.

Sonnet vision via Batch API

Each photo goes to Claude Sonnet 4.6 with a single structured prompt: read the plate, interpret it, classify the category (from a 15-category taxonomy), identify the vehicle make/model and state. The Anthropic Batch API gives a 50% discount over synchronous calls — the whole library cost $4.50.

Static JSON committed to the repo

Results land in a single JSON file checked into the website repo. No database, no cron job, no moving parts. The dataset is static — every plate I've photographed, interpreted once at build time. Re-running the backfill script adds new plates to the file.

Next.js Server Components read the JSON at build

The dashboard imports the JSON directly. Next.js reads it once at build time, renders the stats and article, and ships fully static HTML. The photo wall uses native lazy-loading — no virtualization library needed at this size.

Total cost

$4.50 for the full backfill. ~$1.35/mo ongoing (Vercel Blob storage for the images). Zero recurring compute costs — the JSON is static.

What Sonnet sees

The model handles OCR, interpretation, and categorization in a single pass. Here are six representative plates from the dataset — the range runs from genuinely clever to unintentionally philosophical.

VLOOKUP

References the VLOOKUP function in Microsoft Excel, strongly suggesting the o…

URENUFF

You are enough — an affirmation of self-worth and self-acceptance.

ALFALUV

A play on 'Alfa Love' — the owner expresses their love for their Alfa Romeo v…

TRUCE

The word 'TRUCE' suggesting a call for peace, ceasefire, or an end to conflic…

DNA DR

DNA Doctor — likely a physician specializing in genetics, genomics, or molecu…

UNO MAS

Spanish for 'one more' — likely expressing a desire for one more adventure, o…

Sonnet got the OCR right on 82% of plates at ≥85% confidence. The 75 plates it flagged as is_vanity_plate: false were standard-issue plates Apple's Vision label had incorrectly included. That self-sorting capability — the model knows when it's looking at something that doesn't belong — is the part that generalizes well to real messy photo archives.

Same architecture, different photo class

The vanity plate case is deliberately simple — it's a constrained domain with a predictable schema. But the pipeline shape is the same for more valuable photo archives:

Ranch records

Ear tags, brand photos, weigh slips — read by Sonnet into structured livestock records without manual transcription.

Sale catalogs

Photos of cattle or equipment at auction, interpreted into a queryable catalog with condition grading and provenance notes.

Receipts & invoices

Scanned documents classified by vendor, amount, and category — feeding directly into bookkeeping without a human in the loop.

Business cards

Contact photos enriched with role, company, and follow-up context — structured CRM data from a camera roll.

The interesting part isn't the technology — it's knowing which photo classes in a specific business contain enough signal to justify the extraction. That's a judgment call that comes before any code gets written.

Work with Nektar

If your business is sitting on photos, scanned PDFs, or receipts with signal trapped inside them — that's exactly the kind of unstructured data problem I unwind. A $1,500 diagnostic tells you whether extraction is worth it before any real build work starts.

Talk through the problem →