ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.00885
11
3

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

3 December 2018
Yibo Zeng
Fei Feng
W. Yin
ArXivPDFHTML
Abstract

In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with ∣S∣|\mathcal{S}|∣S∣ states, ∣A∣|\mathcal{A}|∣A∣ actions, and a discounted factor γ∈(0,1)\gamma\in(0,1)γ∈(0,1), AsyncQVI uses memory of size O(∣S∣)\mathcal{O}(|\mathcal{S}|)O(∣S∣) and returns an ε\varepsilonε-optimal policy with probability at least 1−δ1-\delta1−δ using \tilde{\mathcal{O}}\big(\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^2}\log(\frac{1}{\delta})\big) samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The results show that our algorithm is highly efficient and achieves linear parallel speedup.

View on arXiv
Comments on this paper