Back to feed
arXiv cs.LG
arXiv cs.LG
5/11/2026
Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

Short summary

Vision-Language Models frequently hallucinate objects due to over-reliance on linguistic priors rather than visual evidence. Researchers introduce Positive-and-Negative Decoding (PND), a training-free inference framework that uses dual-path contrast to amplify visual features while penalizing language-prior-dominant outputs. PND achieves state-of-the-art performance on POPE, MME, and CHAIR benchmarks without retraining the model.

  • VLMs suffer from attention imbalance, under-weighting visual features relative to language priors, causing hallucination
  • PND uses dual-path contrast during inference—positive path amplifies visual evidence, negative path constructs counterfactuals
  • Training-free approach reaches SOTA on standard benchmarks without model retraining

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more