Dev.to
5/13/2026

I Put Gemma 4 Behind My Homelab AI Gateway. This Is the Beginning.
Short summary
Author migrated Gemma 4 into production on a homelab gateway (Forge) replacing Qwen as the default model. Initial failure: serving binary was outdated and didn't recognize the gemma4 architecture; fixed by rebuilding llama.cpp with ROCm. Production issue: model's uncontrolled reasoning blocks broke structured extraction; resolved via gateway-level policy to disable thinking mode for programmatic callers.
- •Deployed Gemma 4 to production homelab gateway as real migration, not side experiment
- •Infrastructure bottleneck: serving binary was 466 commits behind, needed rebuild with ROCm support
- •Output issue: reasoning blocks broke agent/benchmark tasks; fixed via gateway policy, not model prompt
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



