ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15345
32
1

Efficiently Solving Discounted MDPs with Predictions on Transition Matrices

24 February 2025
Lixing Lyu
Jiashuo Jiang
Wang Chi Cheung
ArXivPDFHTML
Abstract

We study infinite-horizon Discounted Markov Decision Processes (DMDPs) under a generative model. Motivated by the Algorithm with Advice framework Mitzenmacher and Vassilvitskii 2022, we propose a novel framework to investigate how a prediction on the transition matrix can enhance the sample efficiency in solving DMDPs and improve sample complexity bounds. We focus on the DMDPs with NNN state-action pairs and discounted factor γ\gammaγ. Firstly, we provide an impossibility result that, without prior knowledge of the prediction accuracy, no sampling policy can compute an ϵ\epsilonϵ-optimal policy with a sample complexity bound better than O~((1−γ)−3Nϵ−2)\tilde{O}((1-\gamma)^{-3} N\epsilon^{-2})O~((1−γ)−3Nϵ−2), which matches the state-of-the-art minimax sample complexity bound with no prediction. In complement, we propose an algorithm based on minimax optimization techniques that leverages the prediction on the transition matrix. Our algorithm achieves a sample complexity bound depending on the prediction error, and the bound is uniformly better than O~((1−γ)−4Nϵ−2)\tilde{O}((1-\gamma)^{-4} N \epsilon^{-2})O~((1−γ)−4Nϵ−2), the previous best result derived from convex optimization methods. These theoretical findings are further supported by our numerical experiments.

View on arXiv
@article{lyu2025_2502.15345,
  title={ Efficiently Solving Discounted MDPs with Predictions on Transition Matrices },
  author={ Lixing Lyu and Jiashuo Jiang and Wang Chi Cheung },
  journal={arXiv preprint arXiv:2502.15345},
  year={ 2025 }
}
Comments on this paper