373
v1v2v3 (latest)

Comparison of parallel SMC and MCMC for Bayesian deep learning

Main:7 Pages
20 Figures
Bibliography:6 Pages
14 Tables
Appendix:21 Pages
Abstract

This work systematically compares parallel implementations of consistent (asymptotically unbiased) Bayesian deep learning algorithms: sequential Monte Carlo sampler (SMC_\parallel) or Markov chain Monte Carlo (MCMC_\parallel). We provide a proof of convergence for SMC_\parallel showing that it theoretically achieves the same level of convergence as a single monolithic SMC sampler, while the reduced communication lowers wall-clock time. It is well-known that the first samples from MCMC need to be discarded to eliminate initialization bias, and that the number of discarded samples must grow like the logarithm of the number of parallel chains to control that bias for MCMC_\parallel. A systematic empirical numerical study on MNIST, CIFAR, and IMDb, reveals that parallel implementations of both methods perform comparably to non-parallel implementations in terms of performance and total cost, and also comparably to each other. However, both methods still require a large wall-clock time, and suffer from catastrophic non-convergence if they aren't run for long enough.

View on arXiv
Comments on this paper