355

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

International Conference on Learning Representations (ICLR), 2023
Abstract

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an O~(α(H+1)(d2H4+dH6)K)\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K}) regret, where α\alpha is the risk level, dd is the dimension of state-action features, HH is the length of each episode, and KK is the number of episodes. We also establish a matching lower bound Ω(α(H1)d2K)\Omega(\sqrt{\alpha^{-(H-1)}d^2K}) to validate the optimality of ICVaR-L with respect to dd and KK. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an O~(α(H+1)DH4K)\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K}) regret, where DD is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

View on arXiv
Comments on this paper