Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

19 September 2023

Thomas Chen

Patrícia Muñoz Ewald

ArXiv (abs)PDF HTML

Main:22 Pages

Bibliography:2 Pages

Appendix:1 Pages

Abstract

In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$ , output space ${\mathbb R}^Q$ with $Q\leq M$ , and training input sample size $N>QM$ . We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of training input vectors belonging to the same output vector $y_j$ , $j=1,\dots,Q$ . In the special case $M=Q$ , we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$ . The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$ -dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}}$ , $j=1,\dots,Q$ . We comment on the characterization of the global minimum of the cost function in the given context.

View on arXiv

Comments on this paper

Geometric structure of shallow neural networks and constructive L2{\mathcal L}^2L2 cost minimization

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization