ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.18186
15
3

Model-free Posterior Sampling via Learning Rate Randomization

27 October 2023
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
Alexey Naumov
Pierre Perrault
Michal Valko
Pierre Menard
    OffRL
ArXivPDFHTML
Abstract

In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order O~(H5SAT)\widetilde{\mathcal{O}}(\sqrt{H^{5}SAT})O(H5SAT​), where HHH is the planning horizon, SSS is the number of states, AAA is the number of actions, and TTT is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order O~(H5/2T(dz+1)/(dz+2))\widetilde{\mathcal{O}}(H^{5/2} T^{(d_z+1)/(d_z+2)})O(H5/2T(dz​+1)/(dz​+2)), where dzd_zdz​ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.

View on arXiv
Comments on this paper