Back to feed
arXiv cs.LG
arXiv cs.LG
6/17/2026
MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

Short summary

MODE is a quantization framework for mixture-of-experts multimodal LLMs that decomposes expert selection by modality to correct biases in importance estimation. It filters redundant vision tokens and evaluates quantization sensitivity per modality, achieving less than 2.9% performance loss at 3-bit weights. The approach excels at extreme compression settings.

  • Addresses bias in expert importance estimation for multimodal models
  • Achieves <2.9% performance loss at W3A16 and improves 2-bit quantization
  • Decomposes by modality to separate vision and text token influences

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more