Geometric structure of Deep Learning networks and construction of global
minimizers
In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by hidden layers, a ramp activation function, an Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces with equal dimension . The hidden layers are defined on spaces , as well. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case , which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of distinct degenerate local minima of the cost function.
View on arXiv