Back to feed
arXiv cs.LG
arXiv cs.LG
5/11/2026
LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

Short summary

LKV replaces heuristic KV cache compression with learned, end-to-end optimization, achieving 15% cache retention with near-lossless performance on long-context benchmarks. The approach aligns compression directly with task objectives rather than static allocation rules. Product teams optimizing LLM inference efficiency should review this methodology.

  • Learned budgeting outperforms hand-crafted heuristics for KV cache allocation
  • Achieves 15% cache retention with minimal performance degradation on LongBench and RULER
  • Differentiable optimization framework enables task-aware compression strategies

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more