DeepLearningAI
5/12/2026

The Ultimate Transformer Course for Working Engineers
Short summary
DeepLearningAI launches Transformers in Practice, taught by AMD's VP of Engineering Sharon Zhou. The course teaches working engineers how transformer models generate text, why hallucinations occur, and how inference optimizations like quantization, KV caching, and flash attention improve GPU performance. Interactive visualizations build intuition for concepts often difficult to grasp.
- •Learn how transformers generate text token-by-token and how sampling affects output
- •Understand attention, positional encoding, and why hallucinations happen; learn RAG and constrained generation solutions
- •Master inference optimizations (quantization, KV caching, flash attention, speculative decoding) for GPU efficiency
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



