Exponentially vanishing sub-optimal local minima in multilayer neural networks

International Conference on Learning Representations (ICLR), 2017

19 February 2017

Abstract

We examine a multilayer neural network with piecewise linear units, input of dimension $d_{0}$ , one hidden layer of width $d_{1}$ , a single output, and a quadratic loss, trained on $N$ datapoints. We prove that in the limit that $N\rightarrow\infty$ , the volume of differentiable regions of the loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input, $d_{0}\left(N\right)=\tilde{\Omega}\left(\sqrt{N}\right)$ and an asymptotically "mild" over-parameterization: $\#\mathrm{parameters=\,}\tilde{\Omega}\left(N\right)$ . Previous results on vanishing local minima so far required many more parameters: $\#\mathrm{parameters=\,}\Omega\left(Nd_{0}\left(N\right)\right)$ , which is typically worse.

View on arXiv

Comments on this paper