Back to feed
arXiv cs.LG
arXiv cs.LG
6/17/2026
Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

Short summary

Researchers demonstrate that KV caches in transformers are 'editable'—field-level changes preserve downstream computation—and 'composable'—precompiled patterns splice into any context with O(L) overhead. The approach maintains 0.90-0.999 logit similarity to full recompute while achieving 14.9x latency reduction and 53-398x time-to-first-token improvement. In vLLM benchmarks, it integrates with prefix caching at 98.5% cache-hit rates.

  • KV caches can be edited at field level while preserving model behavior through downstream memoized conclusions
  • Precompiled patterns are composable across contexts with O(L) rather than O(L²) complexity, achieving 14.9x latency improvement
  • Validates across model families and quantization variants; integrates seamlessly with production prefix caching

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more