520

Symmetry & critical points for a model shallow neural network

Abstract

A detailed analysis is given of a family of critical points determining spurious minima for a model student-teacher 2-layer neural network, with ReLU activation function, and a natural Γ=Sk×Sk\Gamma = S_k \times S_k-symmetry. For a kk-neuron shallow network of this type, analytic equations are given which, for example, determine the critical points of the spurious minima described by Safran and Shamir (2018) for 6k206 \le k \le 20. These critical points have isotropy (conjugate to) the diagonal subgroup ΔSk1ΔSk\Delta S_{k-1}\subset \Delta S_k of Γ\Gamma. It is shown that critical points of this family can be expressed as an infinite series in 1/k1/\sqrt{k} (for large enough kk) and, as an application, the critical values decay like ak1a k^{-1}, where a0.3a \approx 0.3. Other non-trivial families of critical points are also described with isotropy conjugate to ΔSk1,ΔSk\Delta S_{k-1}, \Delta S_k and Δ(S2×Sk2)\Delta (S_2\times S_{k-2}) (the latter giving spurious minima for k9k\ge 9). The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin's implicit function theorem, and are applicable to other families of critical points that occur in this network.

View on arXiv
Comments on this paper