Dev.to
6/18/2026

Nemotron 3 Ultra went live June 4. Here's the call that works.
Short summary
NVIDIA released Nemotron 3 Ultra on June 4, a 550B-parameter open-weights hybrid Mamba-Transformer model scoring 48 on the Artificial Analysis Intelligence Index with exceptional speed (300+ tokens/sec). Available via build.nvidia.com, OpenRouter, Hugging Face, and self-hosted NIM containers. The guide provides three implementation patterns using the OpenAI-compatible Chat Completions API, hardware requirements, and critical pitfalls—use the post-trained instruct checkpoint, not Base.
- •Nemotron 3 Ultra: 550B parameters, 55B active per token, scores 48 on Artificial Analysis Intelligence Index
- •Available via build.nvidia.com (NIM), OpenRouter, Hugging Face, and self-hosted Docker containers
- •Three implementation paths with Python code examples; requires data-center hardware (8×H100-80GB minimum for comparable model)
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



