273

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Communications in Optimization Theory (COT), 2021
Abstract

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form f(θ)=0\mathbf{f}(\boldsymbol{\theta}) = \mathbf{0} where f:RdRd\mathbf{f} : \mathbb{R}^d \rightarrow \mathbb{R}^d, when only noisy measurements of f()\mathbf{f}(\cdot) are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess θt\boldsymbol{\theta}_t is updated at each time, and "asynchronous" updating, whereby ony one component of θt\boldsymbol{\theta}_t is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of θt\boldsymbol{\theta}_t are updated at each time tt. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.d\ sequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.

View on arXiv
Comments on this paper