arXiv cs.LG
6/17/2026

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs
Short summary
MODE is a quantization framework for mixture-of-experts multimodal LLMs that decomposes expert selection by modality to correct biases in importance estimation. It filters redundant vision tokens and evaluates quantization sensitivity per modality, achieving less than 2.9% performance loss at 3-bit weights. The approach excels at extreme compression settings.
- •Addresses bias in expert importance estimation for multimodal models
- •Achieves <2.9% performance loss at W3A16 and improves 2-bit quantization
- •Decomposes by modality to separate vision and text token influences
Generated with AI, which can make mistakes.
Is this a good recommendation for you?