69
53

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Abstract

We design an algorithm which finds an ϵ\epsilon-approximate stationary point (with F(x)ϵ\|\nabla F(x)\|\le \epsilon) using O(ϵ3)O(\epsilon^{-3}) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---that it cannot be improved using stochastic ppth order methods for any p2p\ge 2, even when the first pp derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ϵ,γ)(\epsilon,\gamma)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

View on arXiv
Comments on this paper