Tight Dimension Independent Lower Bound on Optimal Expected Convergence Rate for Diminishing Step Sizes in SGD

10 October 2018

Abstract

We study the convergence of Stochastic Gradient Descent (SGD) for strongly convex and smooth objective functions $F$ . We prove a lower bound on the expected convergence rate which holds for any sequence of diminishing stepsizes as a function of only global knowledge such as the smoothness and strong convexity of $F$ , the smoothness and convexity of the component functions, together with more additional information. Our lower bound meets the expected convergence rate of a recently proposed sequence of stepsizes at ICML 2018, which is based on such knowledge, within a factor 32. This shows that the stepsizes proposed in the ICML paper are close to optimal. Furthermore, we conclude that in order to be able to construct stepsizes that beat our lower bound, more detailed information about $F$ must be known/used. Our work significantly improves over the state-of-the-art lower bound which we show is another factor $643\cdot d$ worse, where $d$ is the dimension. We are the first to prove a lower bound that comes within a small constant -- independent from any other problem specific parameters -- from an optimal solution.

View on arXiv

Comments on this paper