A document pipeline built as a set of Claude tools — reads a doc, masks PII, validates the extraction with structured JSON outputs, escalates anything unusual to a human reviewer, tracks token cost. Built for regulated environments: PII masking, HITL review, and validation as the default, not the add-on.
Document processing pipelines look simple on a slide. Read it. Check it. Hide the personal info. Send anything wrong to a human. Ship. In practice they're a black box with Claude in the middle, doing everything from text extraction (OCR) to business rules to deciding what counts as personal info. Hard to debug, hard to know the cost, hard to swap one piece out.
This pipeline pulls those concerns apart. Five custom Claude tools, one per stage. Claude is the conductor, not the pipeline itself. Each tool owns its piece of the work — doc_extract returns a schema and the rules to follow, pii_mask hides personal info by field path and by pattern matching, doc_validate runs schema + business-rule checks and produces a list of exceptions, hitl_route turns those exceptions into a prioritized queue, cost_report totals the tokens.
The interesting design choice: doc_extract doesn't call Claude. It returns a work order — task, schema, rules — for Claude to carry out. No Anthropic API key needed for the demo. Production swaps the work order for a real API call without changing any other stage. The pattern survives the boundary.
Each stage is a separate custom Claude tool, swappable on its own. The conductor is Claude. Three of the five tools change something (extract, mask, route); one is a check (validate); one totals things up (cost). The pipeline is composition.
doc_extract without touching the four downstream tools.
Four runs: three clean documents, plus one claim with deliberately broken amounts and a missing required field. The traces below are real outputs from the pipeline runner — same JSON the MCP tools produce when run end-to-end.
The interesting MCP design choice. doc_extract doesn't call Claude — it's a tool that returns a structured work order for Claude to carry out. Production with direct API access plugs in here without touching any downstream stage.
request { "doc_text": "NORTHSTAR MUTUAL — AUTO CLAIM SUBMISSION\nClaim ID: NSM-2026-0451829\n...", "doc_type": "claim" } response { "task": "Extract structured data from this claim document. Conform exactly to the schema. Return JSON only.", "doc_type": "claim", "schema": { /* full JSON Schema for AutoClaim — types, requireds, patterns, formats */ }, "extraction_rules": [ "Output JSON only, conforming exactly to the provided schema. No commentary, no markdown fences.", "Use ISO 8601 for all dates (YYYY-MM-DD) and date-times (YYYY-MM-DDTHH:mm:ssZ or with offset).", "Currency values: strip $ and commas, return as numbers.", "Missing fields → null. Do not invent values.", "policyholder.address.state should be the two-letter USPS code.", "vehicle.year is an integer, not a string.", /* 7 more rules */ ], "output_format": "Plain JSON, valid against the schema. No prose, no fences, no comments.", "on_completion": "Pass the extracted JSON to pii_mask with doc_type='claim'.", "estimate": { "stage": "extract", "input_tokens": 1955 } }
doc_extract / pii_mask / doc_validate / hitl_route / cost_report. Each owns one contract; the pipeline is composition.doc_extract returns a schema + rules + task instead of calling Claude. Claude executes. No API key for the demo; the same plug-in point takes a real API call in production.doc_validate is redaction-aware. Format and pattern errors on values that look like [TYPE-REDACTED] are filtered out automatically. Rules that need the original PII (the age check against a redacted DOB, for example) skip with a note that they should run before masking in production.businessRules array in the JSON Schema file). One file per doc type. The functions that evaluate them live in the validator file. Adding a rule is two changes: schema entry plus evaluator function.scripts/run-pipeline.mjs) composes all five tools end-to-end and emits a single trace per fixture. Those traces — checked into the repo — are the source-of-truth for the playback on this page.hitl_route based on rule type. Missing required fields are high; business rule failures with monetary impact are high; other schema issues are medium; everything else is low.doc_extract. The OCR tool is the obvious next addition — and a clean MCP shape: take a PDF, return the text plus a confidence score per page.queue_id, priority, suggested_resolution — is portable. The persistence layer isn't.pii_mask is documented but not auto-run. The tool returns a list of free-text fields to scan; Claude can scan those. A future tool — pii_review — would close that loop.chars / 4. Real API runs return measured token counts; cost_report already accepts a method: 'measured' flag and reports mode accordingly. Demo runs in estimated; production flips it.doc_extract — the same kind of system Project 010 (voice_check v2) is going to need.