272

On Information Gain and Regret Bounds in Gaussian Process Bandits

International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Abstract

Consider the sequential optimization of an unknown, expensive to evaluate and possibly non-convex objective function ff from noisy observations which can be considered as a continuum-armed bandit problem. Bayesian optimization algorithms based on Gaussian Process (GP) models are shown to perform favorably in this setting. In particular, upper bounds are proven on the regret performance of two popular algorithms - GP-UCB and GP-TS - under both Bayesian (when ff is a sample from a GP) and frequentist (when ff lives in a reproducing kernel Hilbert space) settings. The regret bounds crucially depend on a quantity referred to as the maximal information gain γT\gamma_T between T(N)T(\in \mathbb{N}) observations and the underlying GP (surrogate) model. In this paper, we build on the spectral properties of positive definite kernels to prove novel bounds on γT\gamma_T. In comparison to the existing works which rely on specific kernels (such as Mat\'ern and SE) to provide explicit bounds on γT\gamma_T and regret, we provide general results in terms of the decay rate of the eigenvalues of the kernel. Specialising our results for common kernels leads to significant improvements over the existing bounds on γT\gamma_T and regret. For the Mat\'ern and SE kernels, where the lower bounds on regret are known, our results reduce the gap between the upper and lower bounds from a polynomial in TT factor, in the existing work, to a logarithmic one, under the Bayesian setting. Furthermore, since our bounds on γT\gamma_T are independent of the optimisation algorithm, they impact the regret bounds under various other settings where γT\gamma_T is essential.

View on arXiv
Comments on this paper