Distributed Variational Inference in Sparse Gaussian Process Regression
and Latent Variable Models
The recently developed Bayesian Gaussian process latent variable model (GPLVM) is a powerful generative model for discovering low dimensional embeddings in linear time complexity. However, modern datasets are so large that even linear-time models find them difficult to cope with. We introduce a novel re-parametrisation of variational inference for the GPLVM and sparse GP model that allows for an efficient distributed inference algorithm. We present a unifying derivation for both models, analytically deriving the optimal variational distribution over the inducing points. We then assess the suggested inference on datasets of different sizes, showing that it scales well with both data and computational resources. We furthermore demonstrate its practicality in real-world settings using datasets with up to 100 thousand points, comparing the inference to sequential implementations, assessing the distribution of the load among the different nodes, and testing its robustness to network failures.
View on arXiv