210

Bayesian Hypernetworks

Abstract

We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, hh, is a neural network which learns to transform a simple noise distribution, p(ϵ)=N(0,I)p(\epsilon) = \mathcal{N}(0,I), to a distribution q(θ)q(h(ϵ))q(\theta) \doteq q(h(\epsilon)) over the parameters θ\theta of another neural network (the "primary network"). We train qq with variational inference, using an invertible hh to enable efficient estimation of the variational lower bound on the posterior p(θD)p(\theta | \mathcal{D}) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of q(θ)q(\theta). We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.

View on arXiv
Comments on this paper