Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models

15 November 2019

Abstract

Modern methods for Bayesian regression with binary responses are either computationally impractical or inaccurate in high dimensions. In fact, as discussed in recent literature, bypassing this trade-off is still an open problem which is object of intense research. To cover this gap, we develop a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit regression with Gaussian priors. Our method leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting variational approximation belongs to a tractable class of unified skew-normal densities that crucially incorporates skewness and, unlike for state-of-the-art variational Bayes solutions, converges to the exact posterior density as the number of predictors p goes to infinity. To solve the variational optimization problem, we develop a tractable coordinate ascent variational algorithm which easily scales to p in tens of thousands, and provably requires a number of iterations converging to 1 as p goes to infinity. These findings are also illustrated in extensive simulation studies and in real-world medical applications where our methods are shown to uniformly improve classical mean-field variational Bayes in terms of inference accuracy and predictive performance. The magnitude of such gains is especially remarkable in those high-dimensional p>n settings where state-of-the-art alternative strategies are computationally impractical.

View on arXiv

Comments on this paper