ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11348
20
52

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

23 September 2020
K. C. Kalagarla
Rahul Jain
Pierluigi Nuzzo
ArXivPDFHTML
Abstract

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ϵ\epsilonϵ-optimal policy, i.e., with resulting objective value within ϵ\epsilonϵ of the optimal value and satisfying the constraints within ϵ\epsilonϵ-tolerance, with probability at least 1−δ1-\delta1−δ. The number of episodes needed is shown to be of the order O~(∣S∣∣A∣C2H2ϵ2log⁡1δ)\tilde{\mathcal{O}}\big(\frac{|S||A|C^{2}H^{2}}{\epsilon^{2}}\log\frac{1}{\delta}\big)O~(ϵ2∣S∣∣A∣C2H2​logδ1​), where CCC is the upper bound on the number of possible successor states for a state-action pair. Therefore, if C≪∣S∣C \ll |S|C≪∣S∣, the number of episodes needed have a linear dependence on the state and action space sizes ∣S∣|S|∣S∣ and ∣A∣|A|∣A∣, respectively, and quadratic dependence on the time horizon HHH.

View on arXiv
Comments on this paper