215
v1v2v3v4v5 (latest)

Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces

Main:13 Pages
Bibliography:4 Pages
Appendix:17 Pages
Abstract

We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the dd-dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a O(βTdlog(1+dλβ))\mathcal{O}\Big(\beta\sqrt{Td\log(1+\sqrt{d}\frac{\lambda}{\beta})}\Big) rate against a β\beta-bounded λ\lambda-Lipschitz adversary.

View on arXiv
Comments on this paper