1.3K

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

International Conference on Learning Representations (ICLR), 2024
Main:10 Pages
16 Figures
Bibliography:5 Pages
2 Tables
Appendix:19 Pages
Abstract

In this paper, we propose the geometric invariance hypothesis (GIH)\textit{geometric invariance hypothesis (GIH)}, which argues that when training a neural network, the input space curvature remains invariant under transformation in certain directions determined by its architecture. Starting with a simple non-linear binary classification problem residing on a plane in a high dimensional space, we observe that while an MLP can solve this problem regardless of the orientation of the plane, this is not the case for a ResNet. Motivated by this example, we define two maps that provide a compact architecture-dependent\textit{architecture-dependent} summary of the input space geometry of a neural network and its evolution during training, which we dub the average geometry\textbf{average geometry} and average geometry evolution\textbf{average geometry evolution}, respectively. By investigating average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the projection of data covariance onto average geometry. As a result, in cases where the average geometry is low-rank (such as in a ResNet), the geometry only changes in a subset of the input space. This causes an architecture-dependent invariance property in input-space curvature, which we dub GIH. Finally, we present extensive experimental results to observe the consequences of GIH and how it relates to generalization in neural networks.

View on arXiv
Comments on this paper