Back to feed
arXiv cs.CL
arXiv cs.CL
6/16/2026
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Short summary

NVIDIA released Nemotron 3 Ultra, an open-source 550B-parameter Mamba-Transformer hybrid with 1M token context and 6x higher inference throughput than comparable LLMs. Designed for long-running agentic tasks with advanced techniques like LatentMoE and multi-teacher distillation. Base, post-trained, and quantized checkpoints available on HuggingFace.

  • 550B total parameters (55B active) with 1M token context window
  • 6x higher inference throughput vs state-of-the-art open models
  • Fully open-sourced on HuggingFace with training data and recipes

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more