We consider a kernelized version of the -greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing kernel Hilbert space (RKHS). We propose an online weighted kernel ridge regression estimator for the reward functions. Under some conditions on the exploration probability sequence, , and choice of the regularization parameter, , we show that the proposed estimator is consistent. We also show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of under a margin condition for finite-dimensional RKHS.
View on arXiv@article{arya2025_2306.17329, title={ Kernel $ε$-Greedy for Multi-Armed Bandits with Covariates }, author={ Sakshi Arya and Bharath K. Sriperumbudur }, journal={arXiv preprint arXiv:2306.17329}, year={ 2025 } }