v1v2 (latest)

Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking

11 September 2025

ArXiv (abs)PDF HTML Github

Main:13 Pages

8 Figures

Bibliography:2 Pages

3 Tables

Abstract

This paper addresses distributed stochastic optimization problems under non-i.i.d. data, focusing on the inherent trade-offs between communication and computational efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method that enables tunable numbers of local updates and neighbor communications per round, thereby adapting efficiently to diverse system resource conditions. Leveraging a unified convergence analysis framework, we derive tight communication and computational complexity for FlexGT with explicit dependence on objective properties and certain tunable parameters. Moreover, we introduce an accelerated variant, termed Acc-FlexGT, and prove that, with prior knowledge of the graph, it achieves Pareto-optimal trade-offs between communication and computation. Particularly, in the nonconvex case, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}}\left( \left( L\sigma ^2 \right) /\left( n\epsilon ^2 \right) +L/\left( \epsilon \sqrt{1-\sqrt{\rho _W}} \right) \right) $ and optimal communication complexity of $\tilde{\mathcal{O}}\left( L/\left( \epsilon \sqrt{1-\sqrt{\rho _W}} \right) \right)$ for appropriately chosen numbers of local updates, matching existing lower bounds up to logarithmic factors. And, it improves the existing results for the strongly convex case by a factor of $\tilde{\mathcal{O}} \left( 1/\sqrt{\epsilon} \right)$ , where $\epsilon$ is the targeted accuracy, $n$ the number of nodes, $L$ the Lipschitz constant, $\rho_W$ the connectivity of the graph, and $\sigma$ the stochastic gradient variance. Numerical experiments corroborate the theoretical results and demonstrate the effectiveness of the proposed methods.

View on arXiv

Comments on this paper