Back to feed
arXiv cs.LG
arXiv cs.LG
6/18/2026
Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Short summary

Researchers introduce Gaussian Mixture Attention (GMA), a new transformer attention mechanism that reduces memory complexity from O(N²) to O(NK) by routing queries and keys through learned Gaussian components. GMA maintains competitive performance on long-context tasks while using linear memory scaling. The approach offers a probabilistic, interpretable alternative to standard attention, though optimized implementations still match or exceed it on some benchmarks.

  • New attention mechanism (GMA) reduces memory from O(N²) to O(NK)
  • Maintains competitive performance on long-context classification
  • Probabilistic alternative to standard softmax attention

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more