On Information Gain and Regret Bounds in Gaussian Process Bandits

International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

15 September 2020

Abstract

Consider the sequential optimization of an unknown, expensive to evaluate and possibly non-convex objective function $f$ from noisy observations which can be considered as a continuum-armed bandit problem. Bayesian optimization algorithms based on Gaussian Process (GP) models are shown to perform favorably in this setting. In particular, upper bounds are proven on the regret performance of two popular algorithms $-$ GP-UCB and GP-TS $-$ under both Bayesian (when $f$ is a sample from a GP) and frequentist (when $f$ lives in a reproducing kernel Hilbert space) settings. The regret bounds crucially depend on a quantity referred to as the maximal information gain $\gamma_T$ between $T(\in \mathbb{N})$ observations and the underlying GP (surrogate) model. In this paper, we build on the spectral properties of positive definite kernels to prove novel bounds on $\gamma_T$ . In comparison to the existing works which rely on specific kernels (such as Mat\'ern and SE) to provide explicit bounds on $\gamma_T$ and regret, we provide general results in terms of the decay rate of the eigenvalues of the kernel. Specialising our results for common kernels leads to significant improvements over the existing bounds on $\gamma_T$ and regret. For the Mat\'ern and SE kernels, where the lower bounds on regret are known, our results reduce the gap between the upper and lower bounds from a polynomial in $T$ factor, in the existing work, to a logarithmic one, under the Bayesian setting. Furthermore, since our bounds on $\gamma_T$ are independent of the optimisation algorithm, they impact the regret bounds under various other settings where $\gamma_T$ is essential.

View on arXiv

Comments on this paper