Back to feed
Alignment Forum
Alignment Forum
6/13/2026
SFT Drives Gemini’s Safety Properties

SFT Drives Gemini’s Safety Properties

Short summary

Google DeepMind researchers found that most safety properties in Gemini models result from supervised fine-tuning (SFT) combined with pretraining, not from reinforcement learning stages. Experiments on Gemini 3.1 Pro and Flash showed SFT-only models performed nearly identically to production versions across multiple safety benchmarks. The finding identifies SFT as a high-leverage intervention point for future AI safety research.

  • SFT + pretraining, not RL, drives Gemini's safety properties
  • Testing shows SFT-only models match production safety performance
  • SFT identified as key intervention point for future safety work

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more