58

Singular Bayesian Neural Networks

Mame Diarra Toure
David A. Stephens
Main:8 Pages
23 Figures
Bibliography:3 Pages
23 Tables
Appendix:54 Pages
Abstract

Bayesian neural networks promise calibrated uncertainty but require O(mn)O(mn) parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as W=ABW = AB^{\top} with ARm×rA \in \mathbb{R}^{m \times r}, BRn×rB \in \mathbb{R}^{n \times r}, we induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-rr manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as r(m+n)\sqrt{r(m+n)} instead of mn\sqrt{m n}, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves predictive performance competitive with 5-member Deep Ensembles while using up to 15×15\times fewer parameters. Furthermore, it substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines.

View on arXiv
Comments on this paper