Exponentially vanishing sub-optimal local minima in multilayer neural
networks
International Conference on Learning Representations (ICLR), 2017
Abstract
We examine a multilayer neural network with piecewise linear units, input of dimension , one hidden layer of width , a single output, and a quadratic loss, trained on datapoints. We prove that in the limit that , the volume of differentiable regions of the loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input, and an asymptotically "mild" over-parameterization: . Previous results on vanishing local minima so far required many more parameters: , which is typically worse.
View on arXivComments on this paper
