229

Geometric structure of shallow neural networks and constructive L2{\mathcal L}^2 cost minimization

Main:22 Pages
Bibliography:2 Pages
Appendix:1 Pages
Abstract

In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an L2{\mathcal L}^2 Schatten class (or Hilbert-Schmidt) cost function, input space RM{\mathbb R}^M, output space RQ{\mathbb R}^Q with QMQ\leq M, and training input sample size N>QMN>QM. We prove an upper bound on the minimum of the cost function of order O(δPO(\delta_P where δP\delta_P measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages x0,j\overline{x_{0,j}} of training input vectors belonging to the same output vector yjy_j, j=1,,Qj=1,\dots,Q. In the special case M=QM=Q, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for QMQ\leq M by a relative error O(δP2)O(\delta_P^2). The proof of the upper bound yields a constructively trained network; we show that it metrizes the QQ-dimensional subspace in the input space RM{\mathbb R}^M spanned by x0,j\overline{x_{0,j}}, j=1,,Qj=1,\dots,Q. We comment on the characterization of the global minimum of the cost function in the given context.

View on arXiv
Comments on this paper