20
0

Multi-Timescale Gradient Sliding for Distributed Optimization

Main:28 Pages
5 Figures
Bibliography:4 Pages
1 Tables
Appendix:9 Pages
Abstract

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates.To find an ϵ\epsilon-suboptimal solution, the complexities of our algorithms achieve optimal dependency on ϵ\epsilon: MT-GS needs O(rA/ϵ)O(\overline{r}A/\epsilon) communication rounds and O(r/ϵ2)O(\overline{r}/\epsilon^2) subgradient steps for Lipchitz objectives, and AMT-GS needs O(rA/ϵμ)O(\overline{r}A/\sqrt{\epsilon\mu}) communication rounds and O(r/(ϵμ))O(\overline{r}/(\epsilon\mu)) subgradient steps if the objectives are also μ\mu-strongly convex. Here, r\overline{r} measures the ``average rate of updates'' for dual blocks, and AA measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on AA is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).

View on arXiv
@article{zhang2025_2506.15387,
  title={ Multi-Timescale Gradient Sliding for Distributed Optimization },
  author={ Junhui Zhang and Patrick Jaillet },
  journal={arXiv preprint arXiv:2506.15387},
  year={ 2025 }
}
Comments on this paper