Learning Process Rewards via Success Visitation Matching for Efficient RL

In many modern applications of reinforcement learning (RL), the natural reward for a task of interest is inherently sparse: a reward of 0 is given everywhere except when the task is completed, when a reward of +1 is given. Training a policy to maximize such a sparse reward requires solving a challen...

Read Original Article →

Source

http://arxiv.org/abs/2606.23640v1