ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.02717
12
34

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

5 August 2021
Andrew Wagenmaker
Max Simchowitz
Kevin G. Jamieson
ArXivPDFHTML
Abstract

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying ϵ\epsilonϵ-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an ϵ\epsilonϵ-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an ϵ\epsilonϵ-optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity -- yielding a complexity which scales with the suboptimality gaps and the "reachability" of a state. We show our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

View on arXiv
Comments on this paper