13
3

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Abstract

Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of O(d1/2T1/4)\mathcal{O}(d^{1/2}T^{-1/4}), where dd represents the dimension and TT is the iteration number. In this paper, we improve this convergence rate to O(d1/2T1/3)\mathcal{O}(d^{1/2}T^{-1/3}) by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of O(m1/4d1/2T1/2)\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2}), where mm denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of O(d1/2T1/2+dn1/2)\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2}) and O(d1/4T1/4)\mathcal{O}(d^{1/4}T^{-1/4}) respectively, outperforming the previous results of O(dT1/4+dn1/2)\mathcal{O}(dT^{-1/4} + dn^{-1/2}) and O(d3/8T1/8)\mathcal{O}(d^{3/8}T^{-1/8}), where nn represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

View on arXiv
Comments on this paper