292

On Information Gain and Regret Bounds in Gaussian Process Bandits

International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Abstract

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function ff from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when ff is a sample from a Gaussian process (GP)) and a frequentist (when ff lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain γT\gamma_T between TT observations and the underlying GP (surrogate) model. We provide general bounds on γT\gamma_T based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on γT\gamma_T, and consequently the regret bounds relying on γT\gamma_T under numerous settings. For the Mat\'ern family of kernels, where the lower bounds on γT\gamma_T, and regret under the frequentist setting, are known, our results close a huge polynomial in TT gap between the upper and lower bounds (up to logarithmic in TT factors).

View on arXiv
Comments on this paper