arXiv cs.LG
5/12/2026

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant
Short summary
Research analyzing three KV cache quantization schemes reveals KQV outperforms alternatives at 4-bit budget across all metrics. Jensen's inequality amplification through softmax explains performance variations at different bit budgets. Opens rate-distortion optimization problem for asymmetric K-V quantization.
- •KQV quantization wins at 4-bit budget on every metric (KL divergence, geometric error, distance)
- •K-V asymmetry is unconditional: QKQV consistently worse than KQV in KL divergence across all budgets
- •Jensen mechanism through softmax is likely operative cause of performance crossover at different bit budgets
Generated with AI, which can make mistakes.
Is this a good recommendation for you?