334

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

SIAM Journal on Optimization (SIAM J. Optim.), 2016
Abstract

In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that noisy information about the gradients of the objective function is available via a stochastic first-order oracle (\SFO\SFO). We propose a general framework for such methods, for which we prove almost sure convergence to stationary points and analyze its worst-case iteration complexity. When a randomly chosen iterate is returned as the output of such an algorithm, we prove that in the worst-case, the \SFO\SFO-calls complexity is O(ϵ2)O(\epsilon^{-2}) to ensure that the expectation of the squared norm of the gradient is smaller than the given accuracy tolerance ϵ\epsilon. We also propose a specific algorithm, namely a stochastic damped L-BFGS (SdLBFGS) method, that falls under the proposed framework. Moreover, we incorporate the SVRG variance reduction technique into the proposed SdLBFGS method, and analyze its \SFO\SFO-calls complexity. Numerical results on a nonconvex binary classification problem using SVM, and a multiclass classification problem using neural networks are reported.

View on arXiv
Comments on this paper