ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.02319
11
3

Improving reinforcement learning algorithms: towards optimal learning rate policies

6 November 2019
Othmane Mounjid
Charles-Albert Lehalle
ArXivPDFHTML
Abstract

This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate O(1/N)O(1/\sqrt{N})O(1/N​) is pessimistic and can be replaced by O((log⁡(N)/N)β)O((\log(N)/N)^{\beta})O((log(N)/N)β) with 12≤β≤1\frac{1}{2}\leq \beta \leq 121​≤β≤1 and NNN the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate (γk)k≥0(\gamma_k)_{k\geq 0}(γk​)k≥0​ used in stochastic approximation (SA). We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present the \nameref{Alg:v_4_s} algorithm (for "PAst Sign Search") which, based on a predefined sequence (γko)k≥0(\gamma^o_k)_{k\geq 0}(γko​)k≥0​, constructs a new sequence (γki)k≥0(\gamma^i_k)_{k\geq 0}(γki​)k≥0​ whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined sequence (γko)k≥0(\gamma^o_k)_{k\geq 0}(γko​)k≥0​. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in reinforcement learning (RL) in the three following applications: the estimation of a drift, the optimal placement of limit orders and the optimal execution of large number of shares.

View on arXiv
Comments on this paper