Convergence of Batch Asynchronous Stochastic Approximation With
Applications to Reinforcement Learning
The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form where , when only noisy measurements of are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess is updated at each time, and "asynchronous" updating, whereby ony one component of is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of are updated at each time . In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.d\ sequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.
View on arXiv