20
245

Natasha 2: Faster Non-Convex Optimization Than SGD

Abstract

We design a stochastic algorithm to train any smooth neural network to ε\varepsilon-approximate local minima, using O(ε3.25)O(\varepsilon^{-3.25}) backpropagations. The best result was essentially O(ε4)O(\varepsilon^{-4}) by SGD. More broadly, it finds ε\varepsilon-approximate local minima of any smooth nonconvex function in rate O(ε3.25)O(\varepsilon^{-3.25}), with only oracle access to stochastic gradients.

View on arXiv
Comments on this paper