13
0

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Abstract

This paper considers the optimization problem of the form minxRdf(x)1ni=1nfi(x)\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x}), where f()f(\cdot) satisfies the Polyak--{\L}ojasiewicz (PL) condition with parameter μ\mu and {fi()}i=1n\{f_i(\cdot)\}_{i=1}^n is LL-mean-squared smooth. We show that any gradient method requires at least Ω(n+κnlog(1/ϵ))\Omega(n+\kappa\sqrt{n}\log(1/\epsilon)) incremental first-order oracle (IFO) calls to find an ϵ\epsilon-suboptimal solution, where κL/μ\kappa\triangleq L/\mu is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals f1(),,fn()f_1(\cdot),\dots,f_n(\cdot) are located on a connected network of nn agents. We provide lower bounds of Ω(κ/γlog(1/ϵ))\Omega(\kappa/\sqrt{\gamma}\,\log(1/\epsilon)), Ω((κ+τκ/γ)log(1/ϵ))\Omega((\kappa+\tau\kappa/\sqrt{\gamma}\,)\log(1/\epsilon)) and Ω(n+κnlog(1/ϵ))\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big) for communication rounds, time cost and local first-order oracle calls respectively, where γ(0,1]\gamma\in(0,1] is the spectral gap of the mixing matrix associated with the network and~τ>0\tau>0 is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

View on arXiv
Comments on this paper