Frivolous Units Help to Explain Non-Overfitting in Overparametrized Deep
Neural Networks
A remarkable characteristic of overparameterized deep neural networks is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of large networks to the learning task at hand. At the unit level, however, these representations are poorly understood. A better understanding of what compressible features form amongst units will enable a more granular interpretation of the representations in overparametrized networks. Are there mechanisms at the unit level by which networks control their effective complexity? If so, how do these depend on the architecture, dataset, and training parameters? We identify two distinct types of "frivolous" units that proliferate when the network's width is increased: prunable units which can be dropped out of the network without significant change to the output and redundant units whose activities can be expressed as a linear combination of others. These units imply complexity constraints as the function the network represents could be expressed by a network without them. These results help to explain non-overfitting by showing that overparameterized networks consistently autoregularize via the formation of these frivolous units.
View on arXiv