The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchJune 15, 2026

Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning

Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward engineering. Preference-based RL (PbRL) offers a promising alternative by learning reward functions from human feedback, but its scalability is hindered by high labeling costs. Inspired by advances in...

Read Original Article →

Source

http://arxiv.org/abs/2606.16856v1