Automating Potential-based Reward Shaping with Vision Language Model Guidance

Sparse rewards are inherently challenging for reinforcement learning agents as they lack intermediate feedback to guide exploration and to correctly attribute the sparse success rewards to relevant parts of the trajectory. Naive reward shaping can induce reward hacking, yielding policies that exploi...

Read Original Article →

Source

http://arxiv.org/abs/2606.27180v1