The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchMay 20, 2026

\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Generative policies are more expressive, but often require iterative samplin...

Read Original Article →

Source

http://arxiv.org/abs/2605.21282v1