No Bad Local Minima in Wide and Deep Nonlinear Neural Networks
Abstract
In this paper, we prove that no bad local minimum exists in the deep nonlinear neural networks with sufficiently large widths of the hidden layers if the parameters are initialized by He initialization method. Specifically, in the deep ReLU neural network model with sufficiently large widths of the hidden layers, the following four statements hold true: 1) the loss function is non-convex and non-concave; 2) every local minimum is a global minimum; 3) every critical point that is not a global minimum is a saddle point; and 4) bad saddle points exist.
View on arXivComments on this paper
