Stanford Online
5/11/2026

Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence
Short summary
Stanford's CS25 seminar covers recent advances in LLM pretraining, showing that front-loading reasoning-rich data yields persistent reasoning gains impossible through post-training alone. The seminar formalizes a two-phase pretraining framework for data selection, blending, and sequencing. Led by Shrimai Prabhumoye (Mistral AI) with Stanford faculty including Christopher Manning and Michael C. Frank.
- •Recent advances show data ordering and reasoning-centric integration are critical in LLM pretraining
- •Front-loading reasoning-rich data yields persistent reasoning gains that post-training cannot replicate
- •Formalizes two-phase pretraining framework for data selection, blending, and sequencing
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



