540
v1v2v3 (latest)

Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory

International Conference on Learning Representations (ICLR), 2019
Abstract

We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.

View on arXiv
Comments on this paper