Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

6 September 2019

Abstract

This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT works only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient tracking step using a mini-batch of samples, where the batch size is designed to be proportional to the size of the local dataset. We explicitly evaluate the convergence rate of DSGT with both constant and decreasing stepsizes in terms of algebraic connectivity of the network, mini-batch size, gradient variance, etc. Furthermore, we show that DSGT has a network independence property under certain conditions, which means that the network topology only affects the convergence rate up to a constant factor. Hence, the convergence rate of DSGT with respect to the number of iterations can be comparable to the centralized stochastic gradient method (SGD). A linear speedup is then achievable under the same assumptions as many existing algorithms, since each iteration in DSGT with $n$ nodes generally finishes $n$ times faster than that of the centralized SGD running on a single node. Numerical experiments for neural networks and logistic regression problems on CIFAR-10 finally illustrate the advantages of DSGT.

View on arXiv

Comments on this paper