Second Order Stochastic Optimization in Linear Time
- ODL
Stochastic optimization and, in particular, first-order stochastic methods are a cornerstone of modern machine learning due to their extremely efficient per-iteration computational cost. Second-order methods, while able to provide faster per-iteration convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop a second-order stochastic method for optimization problems arising in machine learning based on novel matrix randomization techniques that match the per-iteration cost of gradient descent, yet enjoy the linear-convergence properties of second-order optimization. We also consider the special case of self-concordant functions where we show that a first order method can achieve linear convergence with guarantees independent of the condition number. We demonstrate significant speedups for training linear classifiers over several convex benchmarks.
View on arXiv