Back to feed
Dev.to
Dev.to
5/13/2026
I Put Gemma 4 Behind My Homelab AI Gateway. This Is the Beginning.

I Put Gemma 4 Behind My Homelab AI Gateway. This Is the Beginning.

Short summary

Author migrated Gemma 4 into production on a homelab gateway (Forge) replacing Qwen as the default model. Initial failure: serving binary was outdated and didn't recognize the gemma4 architecture; fixed by rebuilding llama.cpp with ROCm. Production issue: model's uncontrolled reasoning blocks broke structured extraction; resolved via gateway-level policy to disable thinking mode for programmatic callers.

  • Deployed Gemma 4 to production homelab gateway as real migration, not side experiment
  • Infrastructure bottleneck: serving binary was 466 commits behind, needed rebuild with ROCm support
  • Output issue: reasoning blocks broke agent/benchmark tasks; fixed via gateway policy, not model prompt

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more