Layer-wise Learning of Stochastic Neural Networks with Information
Bottleneck
In this paper, we present a layer-wise learning of stochastic neural networks (SNNs) from an information-theoretic perspective. In each layer of a SNN, compression and relevance are defined to quantify the amount of information that the layer contains about the input space and the target space, respectively. We propose a Parametric Information Bottleneck (PIB) framework to jointly optimize the compression and relevance of all layers in a SNN for better exploiting the neural network's representation. In PIB, we utilize the model parameters explicitly to approximate those two measures. We show that PIB can be considered to be an extension of the maximum likelihood estimate (MLE) principle to every layer level. We empirically show that in the MNIST dataset, as compared to the MLE principle, PIB : (i) improves the generalization of neural networks in classification tasks, (ii) is more efficient in exploiting a neural network's representation by quickly pushing it closer to the optimal information-theoretical representation.
View on arXiv