Multi-Timescale Gradient Sliding for Distributed Optimization

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates.To find an -suboptimal solution, the complexities of our algorithms achieve optimal dependency on : MT-GS needs communication rounds and subgradient steps for Lipchitz objectives, and AMT-GS needs communication rounds and subgradient steps if the objectives are also -strongly convex. Here, measures the ``average rate of updates'' for dual blocks, and measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).
View on arXiv@article{zhang2025_2506.15387, title={ Multi-Timescale Gradient Sliding for Distributed Optimization }, author={ Junhui Zhang and Patrick Jaillet }, journal={arXiv preprint arXiv:2506.15387}, year={ 2025 } }