The500Feed.Live
Everything going on in AI - updated daily from 500+ sources
📄 ResearchMay 12, 2026
Variance-aware Reward Modeling with Anchor Guidance
Standard Bradley--Terry (BT) reward models are limited when human preferences are pluralistic. Although soft preference labels preserve disagreement information, BT can only express it by shrinking reward margins. Gaussian reward models provide an alternative by jointly predicting a reward mean and ...
Read Original Article →