Gaussian Process Bandits without Regret: An Experimental Design Approach

IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2009

21 December 2009

Abstract

We consider the problem of optimizing an unknown, noisy function that is expensive to evaluate. We cast this problem as a multiarmed bandit problem where the payoff function is sampled from a Gaussian Process. We resolve an important open problem on deriving regret bounds for this setting. In particular, we analyze an upper confidence algorithm and bound its cumulative regret in terms of the maximal information gain due to sampling, thus connecting Gaussian Process bandits and optimal experimental design. Moreover, we bound the maximal information gain by exploiting known spectral properties of popular classes of kernels and obtain sub-linear regret bounds for our algorithm. In particular, we show that, perhaps surprisingly, the regret bounds for the squared exponential kernel depend only very weakly on the dimensionality of the problem.

View on arXiv

Comments on this paper