Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-age...

Read Original Article →

Source

http://arxiv.org/abs/2606.06227v1