v1v2v3v4v5 (latest)

On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures

International Conference on Machine Learning (ICML), 2023

26 January 2023

Xian Yu

Lei Ying

ArXiv (abs)PDF HTML

Main:17 Pages

2 Figures

Bibliography:2 Pages

Appendix:13 Pages

Abstract

Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While it has been shown that policy gradient methods can find globally optimal policies in the risk-neutral setting, it remains unclear if the risk-averse variants enjoy the same global convergence guarantees. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive natural policy gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality and iteration complexity of the proposed risk-averse NPG algorithm with softmax parameterization and entropy regularization under both exact and inexact policy evaluation. Furthermore, we test our risk-averse NPG algorithm on a stochastic Cliffwalk environment to demonstrate the efficacy of our method.

View on arXiv

Comments on this paper