11
151

Rates of Convergence for Sparse Variational Gaussian Process Regression

Abstract

Excellent variational approximations to Gaussian process posteriors have been developed which avoid the O(N3)\mathcal{O}\left(N^3\right) scaling with dataset size NN. They reduce the computational cost to O(NM2)\mathcal{O}\left(NM^2\right), with MNM\ll N being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in NN, the true complexity of the algorithm depends on how MM must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on the KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing MM more slowly than NN. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, M=O(logDN)M=\mathcal{O}(\log^D N) is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase MM in continual learning scenarios.

View on arXiv
Comments on this paper