ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.04973
24
0

Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds

7 April 2025
Qian Zuo
Fengxiang He
ArXivPDFHTML
Abstract

This paper studies constrained Markov decision processes (CMDPs) with constraints against stochastic thresholds, aiming at the safety of reinforcement learning in unknown and uncertain environments. We leverage a Growing-Window estimator sampling from interactions with the uncertain and dynamic environment to estimate the thresholds, based on which we design Stochastic Pessimistic-Optimistic Thresholding (SPOT), a novel model-based primal-dual algorithm for multiple constraints against stochastic thresholds. SPOT enables reinforcement learning under both pessimistic and optimistic threshold settings. We prove that our algorithm achieves sublinear regret and constraint violation; i.e., a reward regret of O~(T)\tilde{\mathcal{O}}(\sqrt{T})O~(T​) while allowing an O~(T)\tilde{\mathcal{O}}(\sqrt{T})O~(T​) constraint violation over TTT episodes. The theoretical guarantees show that our algorithm achieves performance comparable to that of an approach relying on fixed and clear thresholds. To the best of our knowledge, SPOT is the first reinforcement learning algorithm that realises theoretical guaranteed performance in an uncertain environment where even thresholds are unknown.

View on arXiv
@article{zuo2025_2504.04973,
  title={ Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds },
  author={ Qian Zuo and Fengxiang He },
  journal={arXiv preprint arXiv:2504.04973},
  year={ 2025 }
}
Comments on this paper