ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.14427
16
31

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

27 October 2021
Vivek Borkar
Shuhang Chen
Adithya M. Devraj
Ioannis Kontoyiannis
Sean P. Meyn
ArXivPDFHTML
Abstract

The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} θn∈ℜd\theta_n\in\Re^dθn​∈ℜd and {Φn} \{ \Phi_n \}{Φn​} is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} ddtϑt=fˉ(ϑt) \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)dtd​ϑt​=fˉ​(ϑt​), is globally asymptotically stable with stationary point denoted θ∗\theta^*θ∗, where fˉ(θ)= E[f(θ,Φ)]\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]fˉ​(θ)= E[f(θ,Φ)] with Φ\PhiΦ having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3) for the chain: (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in L4L_4L4​. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance  E[znznT]\text{ E} [ z_n z_n^T ] E[zn​znT​] to the asymptotic covariance ΣΘ\Sigma^\ThetaΣΘ in the CLT, where zn=(θn−θ∗)/αnz_n= (\theta_n-\theta^*)/\sqrt{\alpha_n}zn​=(θn​−θ∗)/αn​​. (iii) The CLT holds for the normalized version zn PRz^{\text{ PR}}_nzn PR​ of the averaged parameters θn PR\theta^{\text{ PR}}_nθn PR​, subject to standard assumptions on the step-size. Moreover, the normalized covariance of both θn PR\theta^{\text{ PR}}_nθn PR​ and zn PRz^{\text{ PR}}_nzn PR​ converge to Σ PR\Sigma^{\text{ PR}}Σ PR, the minimal covariance of Polyak and Ruppert. (iv)} An example is given where fff and fˉ\bar{f}fˉ​ are linear in θ\thetaθ, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of θn\theta_nθn​ is unbounded and in fact diverges.

View on arXiv
Comments on this paper