arXiv cs.LG
6/17/2026

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable
Short summary
Researchers demonstrate that KV caches in transformers are 'editable'—field-level changes preserve downstream computation—and 'composable'—precompiled patterns splice into any context with O(L) overhead. The approach maintains 0.90-0.999 logit similarity to full recompute while achieving 14.9x latency reduction and 53-398x time-to-first-token improvement. In vLLM benchmarks, it integrates with prefix caching at 98.5% cache-hit rates.
- •KV caches can be edited at field level while preserving model behavior through downstream memoized conclusions
- •Precompiled patterns are composable across contexts with O(L) rather than O(L²) complexity, achieving 14.9x latency improvement
- •Validates across model families and quantization variants; integrates seamlessly with production prefix caching
Generated with AI, which can make mistakes.
Is this a good recommendation for you?
