19
4

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

Abstract

We study the decentralized optimization problem minxRdf(x)1mi=1mfi(x)\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{m}\sum_{i=1}^m f_i({\bf x}), where the local function on the ii-th agent has the form of fi(x)1nj=1nfi,j(x)f_i({\bf x})\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}({\bf x}) and every individual fi,jf_{i,j} is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an ϵ\epsilon-stationary point at each agent with the communication rounds of O~(Lϵ2/γ)\tilde{\mathcal O}(L\epsilon^{-2}/\sqrt{\gamma}\,), the computation rounds of O~(n+(L+min{nL,n/mLˉ})ϵ2)\tilde{\mathcal O}(n+(L+\min\{nL, \sqrt{n/m}\bar L\})\epsilon^{-2}), and the local incremental first-oracle calls of O(mn+min{mnL,mnLˉ}ϵ2){\mathcal O}(mn + {\min\{mnL, \sqrt{mn}\bar L\}}{\epsilon^{-2}}), where LL is the smoothness parameter of the objective function, Lˉ\bar L is the mean-squared smoothness parameter of all individual functions, and γ\gamma is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters LL and Lˉ\bar L used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-Łojasiewicz condition, also achieving the near-optimal complexity bounds.

View on arXiv
Comments on this paper