LlamaIndex
4/28/2026

Deep Dive into Semantic Formatting Score: A New Metric for Meaningful Document Formatting
Short summary
LlamaIndex introduces ParseBench, a benchmarking metric that evaluates document OCR accuracy while preserving semantic formatting context—strikethrough prices, superscript footnotes, bold aggregates—which traditional parsers strip as noise. This ensures engines capture both content and formatting semantics.
- •ParseBench measures OCR accuracy while preserving semantic meaning of document formatting
- •Traditional benchmarks ignore formatting as cosmetic noise, losing critical information
- •Metric validates that parsers capture both text content and formatting semantics
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



