ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.09118
19
59

QQQ-learning with Logarithmic Regret

16 June 2020
Kunhe Yang
Lin F. Yang
S. Du
ArXivPDFHTML
Abstract

This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal QQQ-function. We prove that the optimistic QQQ-learning studied in [Jin et al. 2018] enjoys a O(SA⋅poly(H)Δmin⁡log⁡(SAT)){\mathcal{O}}\left(\frac{SA\cdot \mathrm{poly}\left(H\right)}{\Delta_{\min}}\log\left(SAT\right)\right)O(Δmin​SA⋅poly(H)​log(SAT)) cumulative regret bound, where SSS is the number of states, AAA is the number of actions, HHH is the planning horizon, TTT is the total number of steps, and Δmin⁡\Delta_{\min}Δmin​ is the minimum sub-optimality gap. This bound matches the information theoretical lower bound in terms of S,A,TS,A,TS,A,T up to a log⁡(SA)\log\left(SA\right)log(SA) factor. We further extend our analysis to the discounted setting and obtain a similar logarithmic cumulative regret bound.

View on arXiv
Comments on this paper