arXiv cs.CL
5/12/2026

Sanity Checks for Long-Form Hallucination Detection
Short summary
Researchers introduce controlled-invariance tests (FORCE/REMOVE) to reveal whether hallucination detectors evaluate reasoning quality or exploit answer artifacts. TRACT, a lightweight lexical scorer, achieves competitive performance using hedging trends and step-length dynamics. The core finding: effective detection requires isolating reasoning signal from endpoint cues rather than building complex models.
- •Two oracle tests (FORCE/REMOVE) expose whether detectors rely on reasoning or surface patterns in final answers
- •TRACT achieves strong results with simple lexical features without complex learned representations
- •Challenge is isolating real reasoning signal from answer-level artifacts, not the absence of detection signal
Generated with AI, which can make mistakes.
Is this a good recommendation for you?