PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

We address the problem of making a pre-trained reinforcement learning (RL) policy safety-aware by incorporating cost constraints without retraining it from scratch. While costs could be numerically encoded, we assume a more general setting is when costs are provided as preferences. Given a reward-op...

Read Original Article →

Source

http://arxiv.org/abs/2605.21225v1