417

Exponentially vanishing sub-optimal local minima in multilayer neural networks

International Conference on Learning Representations (ICLR), 2017
Abstract

We examine a multilayer neural network with piecewise linear units, input of dimension d0d_{0}, one hidden layer of width d1d_{1}, a single output, and a quadratic loss, trained on NN datapoints. We prove that in the limit that NN\rightarrow\infty, the volume of differentiable regions of the loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input, d0(N)=Ω~(N)d_{0}\left(N\right)=\tilde{\Omega}\left(\sqrt{N}\right) and an asymptotically "mild" over-parameterization: #parameters=Ω~(N)\#\mathrm{parameters=\,}\tilde{\Omega}\left(N\right). Previous results on vanishing local minima so far required many more parameters: #parameters=Ω(Nd0(N))\#\mathrm{parameters=\,}\Omega\left(Nd_{0}\left(N\right)\right), which is typically worse.

View on arXiv
Comments on this paper