v1v2v3v4v5 (latest)

Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces

20 February 2025

Alexander Terenin

Jeffrey Negrea

ArXiv (abs)PDF HTML Github

Main:13 Pages

Bibliography:4 Pages

Appendix:17 Pages

Abstract

We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the $d$ -dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a $\mathcal{O}\Big(\beta\sqrt{Td\log(1+\sqrt{d}\frac{\lambda}{\beta})}\Big)$ rate against a $\beta$ -bounded $\lambda$ -Lipschitz adversary.

View on arXiv

Comments on this paper