Back to feed
Stanford Online
Stanford Online
5/11/2026
Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Short summary

Stanford graduate lecture on diffusion models and large vision architectures, covering U-Net evolution, Diffusion Transformers (DiT), multimodal variants, and advanced positional encoding techniques. Requires deep neural network background; designed for AI product builders and researchers.

  • Covers U-Net and Diffusion Transformer architectures with timeline evolution
  • Explores multimodal DiT models and state-of-the-art implementations (FLUX.1, Qwen-Image)
  • Technical deep-dive into position embeddings (RoPE) and adaptive layer normalization

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more