arXiv cs.LG
5/12/2026

Distributional Reinforcement Learning via the Cram\'er Distance
Short summary
Researchers introduce C-DSAC, a new Distributional Reinforcement Learning algorithm extending Soft Actor-Critic using Cramér distance minimization. Testing on robotic benchmarks shows superior performance to baseline SAC and other distributional methods, with larger gains in high-complexity tasks. Key insight: confidence-driven Q-value updates treat high-variance distributions as low-confidence targets, reducing overestimation errors.
- •Novel C-DSAC algorithm combines distributional RL with SAC using Cramér distance minimization
- •Empirically outperforms baseline SAC and contemporary distributional methods on robotic control tasks
- •Theoretical mechanism: confidence-driven updates reduce Q-value overestimation and improve convergence
Generated with AI, which can make mistakes.
Is this a good recommendation for you?