Dev.to
6/18/2026

Shipping 100,000 construction PDFs a month: what actually breaks
Short summary
Document pipelines fail in orchestration, not PDFs. Use per-document isolation with fire-and-forget fan-out instead of batching to decouple receipt/processing/commit. Distinguish error types (permanent vs transient) and handle large pages through geometry detection and tiling.
- •Orchestration and error taxonomy matter more than PDF parsing tech
- •Per-document isolation decouples failure domains and simplifies retries
- •Grounding vision LLMs with extracted text prevents hallucination better than model improvements
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



