Natasha 2: Faster Non-Convex Optimization Than SGD

Abstract
We design a stochastic algorithm to train any smooth neural network to -approximate local minima, using backpropagations. The best result was essentially by SGD. More broadly, it finds -approximate local minima of any smooth nonconvex function in rate , with only oracle access to stochastic gradients.
View on arXivComments on this paper