117

Online learning with kernel losses

International Conference on Machine Learning (ICML), 2018
Abstract

We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigendecay of the kernel we provide a sharp characterization of the regret for this algorithm. When we have polynomial eigendecay μjO(jβ)\mu_j \le \mathcal{O}(j^{-\beta}), we find that the regret is bounded by RnO(nβ/(2(β1)))\mathcal{R}_n \le \mathcal{O}(n^{\beta/(2(\beta-1))}); while under the assumption of exponential eigendecay μjO(eβj)\mu_j \le \mathcal{O}(e^{-\beta j }), we get an even tighter bound on the regret RnO(n1/2log(n)1/2)\mathcal{R}_n \le \mathcal{O}(n^{1/2}\log(n)^{1/2}). We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.

View on arXiv
Comments on this paper