Back to feed
Dev.to
Dev.to
6/2/2026
How a Scanned PDF Broke My Invoice Agent in Production

How a Scanned PDF Broke My Invoice Agent in Production

Short summary

A production invoice extraction agent silently failed on scanned PDFs with rotated text and poor OCR, shifting decimal places while returning confident structured output. The fix wasn't a better model—it was a deterministic validation layer checking field presence, amount ranges, date windows, and line-item consistency before downstream system acceptance. This pattern, applicable across all agent deployments, caught 23 failed documents in the first week that would otherwise have contaminated the system.

  • Scanned PDFs with poor OCR caused silent extraction failures despite model confidence
  • Added synchronous validation layer with 4 deterministic checks before downstream acceptance
  • Pattern caught 23 bad documents in first week; applicable across all agent deployments

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more