Distributed Asynchronous Dual Free Stochastic Dual Coordinate Ascent

29 May 2016

Abstract

With the boom of data, it is challenging to optimize a large-scale machine learning problem efficiently. Both of the convergence rate and scalability are nontrivial for an optimization method. Stochastic gradient descent (SGD) is widely applied in large-scale optimization, and it sacrifices its convergence rate for a lower computation complexity per iteration. Recently, variance reduction technique is proposed to accelerate the convergence of stochastic method, for example stochastic variance reduced gradient (SVRG), stochastic duall coordinate ascent (SDCA) or dual free stochastic dual coordinate ascent (dfSDCA). However, serial algorithm is not applicable when the data can not be stored in a single computer. In this paper, we propose Distributed Asynchronous Dual Free Coordinate Ascent method (dis-dfSDCA), and prove that it has linear convergence rate when the problem is convex and smooth. The stale gradient update is common in asynchronous method, and the effect of staleness on the performance of our method is also analyzed in the paper. We conduct experiments on large-scale data from LIBSVM on Amazon Web Services and experimental results demonstrate our analysis.

View on arXiv

Comments on this paper