228
v1v2v3 (latest)

Geometric structure of shallow neural networks and constructive L2{\mathcal L}^2 cost minimization

Main:22 Pages
Bibliography:2 Pages
Appendix:1 Pages
Abstract

In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow ReLU networks through the explicit construction of upper bounds which appeal to the structure of classification data, without use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider an L2\mathcal{L}^2 cost function, input space RM\mathbb{R}^M, output space RQ{\mathbb R}^Q with QMQ\leq M, and training input sample size that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order O(δP)O(\delta_P) where δP\delta_P measures the signal-to-noise ratio of training data. In the special case M=QM=Q, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for QMQ\leq M by a relative error O(δP2)O(\delta_P^2). The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular QQ-dimensional subspace in the input space RM{\mathbb R}^M. We comment on the characterization of the global minimum of the cost function in the given context.

View on arXiv
Comments on this paper