Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration efficiency and robustness. Risk-sensitive policy gradient methods, which incorporate both expected return and risk measures, have been explored for their ability to yield more robust policies, yet their iteration complexity remains largely underexplored. In this work, we conduct a rigorous iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm with an exponential utility function. We establish an iteration complexity of to reach an -approximate first-order stationary point (FOSP). Furthermore, we investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts. Our analysis indicates that risk-sensitive REINFORCE can potentially converge faster. To validate our analysis, we empirically evaluate the learning performance and convergence efficiency of the risk-neutral and risk-sensitive REINFORCE algorithms in multiple environments: CartPole, MiniGrid, and Robot Navigation. Empirical results confirm that risk-averse cases can converge and stabilize faster compared to their risk-neutral counterparts. More details can be found on our websitethis https URL.
View on arXiv@article{liu2025_2403.08955, title={ Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis }, author={ Rui Liu and Anish Gupta and Erfaun Noorani and Pratap Tokekar }, journal={arXiv preprint arXiv:2403.08955}, year={ 2025 } }