Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model

In this work, we study the sample complexity problem of risk-sensitive Reinforcement Learning (RL) with a generative model, where we aim to maximize the Conditional Value at Risk (CVaR) with risk tolerance level at each step, a criterion we refer to as Iterated CVaR. We first build a connection between Iterated CVaR RL and -rectangular distributional robust RL with a specific uncertainty set for CVaR. We establish nearly matching upper and lower bounds on the sample complexity of this problem. Specifically, we first prove that a value iteration-based algorithm, ICVaR-VI, achieves an -optimal policy with at most samples, where is the discount factor, and are the sizes of the state and action spaces. Furthermore, when , the sample complexity improves to . We further show a minimax lower bound of . For a fixed risk level , our upper and lower bounds match, demonstrating the tightness and optimality of our analysis. We also investigate a limiting case with a small risk level , called Worst-Path RL, where the objective is to maximize the minimum possible cumulative reward. We develop matching upper and lower bounds of , where denotes the minimum non-zero reaching probability of the transition kernel.
View on arXiv@article{deng2025_2503.08934, title={ Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model }, author={ Zilong Deng and Simon Khan and Shaofeng Zou }, journal={arXiv preprint arXiv:2503.08934}, year={ 2025 } }