On Information Gain and Regret Bounds in Gaussian Process Bandits
- GP

Consider the sequential optimization of an unknown, expensive to evaluate and possibly non-convex objective function from noisy observations which can be considered as a continuum-armed bandit problem. Bayesian optimization algorithms based on Gaussian Process (GP) models are shown to perform favorably in this setting. In particular, upper bounds are proven on the regret performance of two popular algorithms GP-UCB and GP-TS under both Bayesian (when is a sample from a GP) and frequentist (when lives in a reproducing kernel Hilbert space) settings. The regret bounds crucially depend on a quantity referred to as the maximal information gain between observations and the underlying GP (surrogate) model. In this paper, we build on the spectral properties of positive definite kernels to prove novel bounds on . In comparison to the existing works which rely on specific kernels (such as Mat\'ern and SE) to provide explicit bounds on and regret, we provide general results in terms of the decay rate of the eigenvalues of the kernel. Specialising our results for common kernels leads to significant improvements over the existing bounds on and regret. For the Mat\'ern and SE kernels, where the lower bounds on regret are known, our results reduce the gap between the upper and lower bounds from a polynomial in factor, in the existing work, to a logarithmic one, under the Bayesian setting. Furthermore, since our bounds on are independent of the optimisation algorithm, they impact the regret bounds under various other settings where is essential.
View on arXiv