15
15

Escape saddle points by a simple gradient-descent based algorithm

Abstract

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f ⁣:RnRf\colon\mathbb{R}^n\to\mathbb{R}, it outputs an ϵ\epsilon-approximate second-order stationary point in O~(logn/ϵ1.75)\tilde{O}(\log n/\epsilon^{1.75}) iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with O~((logn)4/ϵ2)\tilde{O}((\log n)^{4}/\epsilon^{2}) or O~((logn)6/ϵ1.75)\tilde{O}((\log n)^{6}/\epsilon^{1.75}) iterations, our algorithm is polynomially better in terms of logn\log n and matches their complexities in terms of 1/ϵ1/\epsilon. For the stochastic setting, our algorithm outputs an ϵ\epsilon-approximate second-order stationary point in O~((logn)2/ϵ4)\tilde{O}((\log n)^{2}/\epsilon^{4}) iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in logn\log n compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

View on arXiv
Comments on this paper