258

On the Implicit Bias Towards Minimal Depth of Deep Neural Networks

Abstract

Recent results in the literature suggest that the penultimate layer representations of neural networks that are trained for classification exhibit a clustering property called neural collapse (NC). We study the implicit bias of stochastic gradient descent (SGD) in favor of low-depth solutions when training deep neural networks. We characterize a notion of effective depth that measures the minimal layer that enjoys neural collapse. Furthermore, we hypothesize and empirically show that SGD implicitly selects neural networks of small effective depths. Secondly, while neural collapse emerges even when generalization should be impossible - we argue that the \emph{rate of collapse} in the intermediate layers is more sensitive, and is closely intertwined with generalization. We derive a generalization bound based on comparing the effective depth of the network with the minimal depth required to fit partially corrupted labels. Remarkably, this bound provides non-trivial estimations of the test performance. Finally, we empirically show that the effective depth of a trained neural network monotonically increases when training with extended portions of random labels.

View on arXiv
Comments on this paper