Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

International Conference on Learning Representations (ICLR), 2024

15 October 2024

Sajad Movahedi

Antonio Orvieto

Seyed-Mohsen Moosavi-Dezfooli

AI4CE

AAML

ArXiv (abs)PDF HTML Github (2★)

Main:10 Pages

16 Figures

Bibliography:5 Pages

2 Tables

Appendix:19 Pages

Abstract

In this paper, we propose the $\textit{geometric invariance hypothesis (GIH)}$ , which argues that when training a neural network, the input space curvature remains invariant under transformation in certain directions determined by its architecture. Starting with a simple non-linear binary classification problem residing on a plane in a high dimensional space, we observe that while an MLP can solve this problem regardless of the orientation of the plane, this is not the case for a ResNet. Motivated by this example, we define two maps that provide a compact $\textit{architecture-dependent}$ summary of the input space geometry of a neural network and its evolution during training, which we dub the $\textbf{average geometry}$ and $\textbf{average geometry evolution}$ , respectively. By investigating average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the projection of data covariance onto average geometry. As a result, in cases where the average geometry is low-rank (such as in a ResNet), the geometry only changes in a subset of the input space. This causes an architecture-dependent invariance property in input-space curvature, which we dub GIH. Finally, we present extensive experimental results to observe the consequences of GIH and how it relates to generalization in neural networks.

View on arXiv

Comments on this paper