arXiv cs.LG
5/11/2026

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
Short summary
Vision-Language Models frequently hallucinate objects due to over-reliance on linguistic priors rather than visual evidence. Researchers introduce Positive-and-Negative Decoding (PND), a training-free inference framework that uses dual-path contrast to amplify visual features while penalizing language-prior-dominant outputs. PND achieves state-of-the-art performance on POPE, MME, and CHAIR benchmarks without retraining the model.
- •VLMs suffer from attention imbalance, under-weighting visual features relative to language priors, causing hallucination
- •PND uses dual-path contrast during inference—positive path amplifies visual evidence, negative path constructs counterfactuals
- •Training-free approach reaches SOTA on standard benchmarks without model retraining
Generated with AI, which can make mistakes.
Is this a good recommendation for you?