Back to feed
Prompt Engineering
Prompt Engineering
6/11/2026
Diffusion Gemma: Google's First Open Diffusion Model

Diffusion Gemma: Google's First Open Diffusion Model

Short summary

Google released Diffusion Gemma, an open-weight 26B MoE model using diffusion-based decoding instead of standard autoregressive token generation. Unlike sequential generation, it produces tokens in parallel within 256-token patches with revision capabilities via entropy/uncertainty budgeting, achieving faster inference with comparable quality. Available in BF16, FP8, and NVFP4 quantizations with immediate support in Transformers, vLLM, MLX, and llama.cpp for local deployment.

  • Diffusion-based decoding enables parallel token generation with revision
  • 26B MoE architecture (4B active) with up to 256K context window
  • Open-weight model with immediate framework support and local deployment options

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more