ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.03201
81
18

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

7 February 2023
Kaiwen Wang
Nathan Kallus
Wen Sun
ArXivPDFHTML
Abstract

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance τ\tauτ. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is Ω(τ−1AK)\Omega(\sqrt{\tau^{-1}AK})Ω(τ−1AK​), where AAA is the number of actions and KKK is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of Ω(τ−1SAK)\Omega(\sqrt{\tau^{-1}SAK})Ω(τ−1SAK​) (with normalized cumulative rewards), where SSS is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of O~(τ−1SAK)\widetilde O(\sqrt{\tau^{-1}SAK})O(τ−1SAK​) under a continuity assumption and in general attains a near-optimal regret of O~(τ−1SAK)\widetilde O(\tau^{-1}\sqrt{SAK})O(τ−1SAK​), which is minimax-optimal for constant τ\tauτ. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.

View on arXiv
Comments on this paper