15
55

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

Abstract

The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima. In a network of d1d-1 hidden layers with nkn_k neurons in layers k=1,,dk = 1, \ldots, d, we construct smooth paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer kk collide and interchange. We show that such permutation points are critical points with at least nk+1n_{k+1} vanishing eigenvalues of the Hessian matrix of second derivatives indicating a local plateau of the loss function. We find that a permutation point for the exchange of neurons ii and jj transits into a flat valley (or generally, an extended plateau of nk+1n_{k+1} flat dimensions) that enables all nk!n_k! permutations of neurons in a given layer kk at the same loss value. Moreover, we introduce high-order permutation points by exploiting the recursive structure in neural network functions, and find that the number of KthK^{\text{th}}-order permutation points is at least by a factor k=1d112!K(nkKK)\sum_{k=1}^{d-1}\frac{1}{2!^K}{n_k-K \choose K} larger than the (already huge) number of equivalent global minima. In two tasks, we illustrate numerically that some of the permutation points correspond to first-order saddles (`permutation saddles'): first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous mathematical results and numerical observations.

View on arXiv
Comments on this paper