Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

We address the discounted reward setting in reinforcement learning (RL). To mitigate the value approximation challenges in policy gradient methods, actor-critic approaches have been developed and are known to converge to stationary points under suitable assumptions. However, these methods rely on fi...

Read Original Article →

Source

http://arxiv.org/abs/2605.14982v1