ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.16286
627
8
v1v2v3v4 (latest)

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

International Conference on Learning Representations (ICLR), 2024
29 August 2024
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
ArXiv (abs)PDFHTML
Main:10 Pages
5 Figures
Bibliography:5 Pages
1 Tables
Appendix:22 Pages
Abstract

Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a bisection search algorithm with a policy gradient subroutine and prove that it identifies an ε\varepsilonε-optimal policy in an RCMDP with O~(ε−4)\tilde{\mathcal{O}}(\varepsilon^{-4})O~(ε−4) robust policy evaluations.

View on arXiv
Comments on this paper