Back to feed
arXiv cs.CL
arXiv cs.CL
6/19/2026
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Short summary

DeepSeek releases V4 series: two MoE models (1.6T with 49B active, 284B with 13B active) supporting 1M-token context windows. Architecture improvements—compressed sparse attention (CSA), heavily compressed attention (HCA), and manifold-constrained hyper-connections—reduce inference FLOPs by 73% and KV cache by 90% vs V3.2. Pre-trained on 32T diverse tokens with Muon optimizer for faster convergence and training stability.

  • MoE architecture with 1.6T and 284B parameter variants supporting 1M token context
  • 73% FLOPs reduction and 90% KV cache savings vs V3.2 through hybrid attention (CSA/HCA)
  • Pre-trained on 32T diverse tokens; Muon optimizer enables faster, more stable training

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more