Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits

We study the noise-free Gaussian Process (GP) bandits problem, in which the learner seeks to minimize regret through noise-free observations of the black-box objective function lying on the known reproducing kernel Hilbert space (RKHS). Gaussian process upper confidence bound (GP-UCB) is the well-known GP-bandits algorithm whose query points are adaptively chosen based on the GP-based upper confidence bound score. Although several existing works have reported the practical success of GP-UCB, the current theoretical results indicate its suboptimal performance. However, GP-UCB tends to perform well empirically compared with other nearly optimal noise-free algorithms that rely on a non-adaptive sampling scheme of query points. This paper resolves this gap between theoretical and empirical performance by showing the nearly optimal regret upper bound of noise-free GP-UCB. Specifically, our analysis shows the first constant cumulative regret in the noise-free settings for the squared exponential kernel and Matérn kernel with some degree of smoothness.
View on arXiv@article{iwazaki2025_2502.19006, title={ Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits }, author={ Shogo Iwazaki }, journal={arXiv preprint arXiv:2502.19006}, year={ 2025 } }