261
v1v2 (latest)

Optimal Local Convergence Rates of Stochastic First-Order Methods under Local αα-PL

Main:7 Pages
9 Figures
Bibliography:4 Pages
Appendix:40 Pages
Abstract

We study the local convergence rate of stochastic first-order methods under a local α\alpha-Polyak-Lojasiewicz (α\alpha-PL) condition in a neighborhood of a target connected component M\mathcal{M} of the local minimizer set. The parameter α[1,2]\alpha \in [1,2] is the exponent of the gradient norm in the α\alpha-PL inequality: α=2\alpha=2 recovers the classical PL case, α=1\alpha=1 corresponds to Holder-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output x^\hat{x} with F(x^)lεF(\hat{x})-l \le \varepsilon, where l:=F(y)l := F(y) for any yMy \in \mathcal{M}. We work in a local regime where the algorithm is initialized near M\mathcal{M} and, with high probability, its iterates remain in that neighborhood. We establish a lower bound Ω(ε2/α)\Omega(\varepsilon^{-2/\alpha}) for all stochastic first-order methods in this regime, and we obtain a matching upper bound O(ε2/α)\mathcal{O}(\varepsilon^{-2/\alpha}) for 1α<21 \le \alpha < 2 via a SARAH-type variance-reduced method with time-varying batch sizes and step sizes. In the convex setting, assuming a local α\alpha-PL condition on the ε\varepsilon-sublevel set, we further show a complexity lower bound Ω~(ε2/α)\widetilde{\Omega}(\varepsilon^{-2/\alpha}) for reaching an ε\varepsilon-global optimum, matching the ε\varepsilon-dependence of known accelerated stochastic subgradient methods.

View on arXiv
Comments on this paper