225

Novel Gradient Sparsification Algorithm via Bayesian Inference

International Workshop on Machine Learning for Signal Processing (MLSP), 2024
Main:5 Pages
3 Figures
Bibliography:2 Pages
Abstract

Error accumulation is an essential component of the Top-kk sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top-kk (RegTop-kk) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at 0.1%0.1\% sparsification, RegTop-kk achieves about 8%8\% higher accuracy than standard Top-kk.

View on arXiv
Comments on this paper