MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

Short summary

MODE is a quantization framework for mixture-of-experts multimodal LLMs that decomposes expert selection by modality to correct biases in importance estimation. It filters redundant vision tokens and evaluates quantization sensitivity per modality, achieving less than 2.9% performance loss at 3-bit weights. The approach excels at extreme compression settings.

•Addresses bias in expert importance estimation for multimodal models
•Achieves <2.9% performance loss at W3A16 and improves 2-bit quantization
•Decomposes by modality to separate vision and text token influences

Generated with AI, which can make mistakes.

#research-breakthrough #ai-tools #market-trend

Read full article at arXiv cs.LG

Is this a good recommendation for you?

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

Short summary

Explore more