Revisiting Step-Size Assumptions in Stochastic Approximation

Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case at iteration , with and design parameters. It is most common in practice to take (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided : Parameter estimates converge with probability one, and also in for any . The MSE may converge very slowly for small , of order even with averaging. For linear stochastic approximation the source of slow convergence is identified: for any , averaging results in estimates for which the error vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the converges to zero at rate . This is the first paper to obtain such strong conclusions while allowing for . A major conclusion is that the choice of or even is justified only in select settings -- In general, bias may preclude fast convergence.
View on arXiv