ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.02637
41
56

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

7 May 2019
Ying Sun
Amir Daneshmand
G. Scutari
ArXivPDFHTML
Abstract

We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of F+GF+GF+G subject to convex constraints, where FFF is the smooth strongly convex sum of the agent's losses and GGG is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of FFF. SONATA achieves precision ϵ>0\epsilon>0ϵ>0 on the objective value in O(κglog⁡(1/ϵ))\mathcal{O}(\kappa_g \log(1/\epsilon))O(κg​log(1/ϵ)) gradient computations at each node and O~(κg(1−ρ)−1/2log⁡(1/ϵ))\tilde{\mathcal{O}}\big(\kappa_g (1-\rho)^{-1/2} \log(1/\epsilon)\big)O~(κg​(1−ρ)−1/2log(1/ϵ)) communication steps, where κg\kappa_gκg​ is the condition number of FFF and ρ\rhoρ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing FFF, whose rate depends on much larger quantities than κg\kappa_gκg​ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision ϵ>0\epsilon>0ϵ>0 in O((β/μ)log⁡(1/ϵ))\mathcal{O}\big((\beta/\mu) \log(1/\epsilon)\big)O((β/μ)log(1/ϵ)) iterations and O~((β/μ)(1−ρ)−1/2log⁡(1/ϵ))\tilde{\mathcal{O}}\big((\beta/\mu) (1-\rho)^{-1/2} \log(1/\epsilon)\big)O~((β/μ)(1−ρ)−1/2log(1/ϵ)) communication steps, where β\betaβ measures the degree of similarity of the agents' losses and μ\muμ is the strong convexity constant of FFF. Therefore, when β/μ<κg\beta/\mu < \kappa_gβ/μ<κg​, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.

View on arXiv
Comments on this paper