140
v1v2 (latest)

Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking

Main:13 Pages
8 Figures
Bibliography:2 Pages
3 Tables
Abstract

This paper addresses distributed stochastic optimization problems under non-i.i.d. data, focusing on the inherent trade-offs between communication and computational efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method that enables tunable numbers of local updates and neighbor communications per round, thereby adapting efficiently to diverse system resource conditions. Leveraging a unified convergence analysis framework, we derive tight communication and computational complexity for FlexGT with explicit dependence on objective properties and certain tunable parameters. Moreover, we introduce an accelerated variant, termed Acc-FlexGT, and prove that, with prior knowledge of the graph, it achieves Pareto-optimal trade-offs between communication and computation. Particularly, in the nonconvex case, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}}\left( \left( L\sigma ^2 \right) /\left( n\epsilon ^2 \right) +L/\left( \epsilon \sqrt{1-\sqrt{\rho _W}} \right) \right) $ and optimal communication complexity of O~(L/(ϵ1ρW))\tilde{\mathcal{O}}\left( L/\left( \epsilon \sqrt{1-\sqrt{\rho _W}} \right) \right) for appropriately chosen numbers of local updates, matching existing lower bounds up to logarithmic factors. And, it improves the existing results for the strongly convex case by a factor of O~(1/ϵ)\tilde{\mathcal{O}} \left( 1/\sqrt{\epsilon} \right), where ϵ\epsilon is the targeted accuracy, nn the number of nodes, LL the Lipschitz constant, ρW\rho_W the connectivity of the graph, and σ\sigma the stochastic gradient variance. Numerical experiments corroborate the theoretical results and demonstrate the effectiveness of the proposed methods.

View on arXiv
Comments on this paper