218

MALCOM-PSGD: Inexact Proximal Stochastic Gradient Descent for Communication-Efficient Decentralized Machine Learning

Abstract

Recent research indicates that frequent model communication stands as a major bottleneck to the efficiency of decentralized machine learning (ML), particularly for large-scale and over-parameterized neural networks (NNs). In this paper, we introduce MALCOM-PSGD, a new decentralized ML algorithm that strategically integrates gradient compression techniques with model sparsification. MALCOM-PSGD leverages proximal stochastic gradient descent to handle the non-smoothness resulting from the 1\ell_1 regularization in model sparsification. Furthermore, we adapt vector source coding and dithering-based quantization for compressed gradient communication of sparsified models. Our analysis shows that decentralized proximal stochastic gradient descent with compressed communication has a convergence rate of O(ln(t)/t)\mathcal{O}\left(\ln(t)/\sqrt{t}\right) assuming a diminishing learning rate and where tt denotes the number of iterations. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately 75%75\% when compared to the state-of-the-art method.

View on arXiv
Comments on this paper