Pareto Q-Learning with Reward Machines

We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates to approximate the ...

Read Original Article →

Source

http://arxiv.org/abs/2606.19134v1