Lightlike Neuromanifolds, Occam's Razor and Deep Learning
Information Geometry (IG), 2019
Abstract
How do deep neural networks benefit from a very high dimensional parameter space? Their high complexity vs stunning generalization performance forms an intriguing paradox. We took an information-theoretic approach. We find that the locally varying dimensionality of the parameter space can be studied by the discipline of singular semi-Riemannian geometry. We adapt Fisher information to this singular neuromanifold. We use a new prior to interpolate between Jeffreys' prior and the Gaussian prior. We derive a minimum description length of a deep learning model, where the spectrum of the Fisher information matrix plays a key role to reduce the model complexity.
View on arXivComments on this paper
