238

A note on the Bayesian regret of Thompson Sampling with an arbitrary prior

Annual Conference on Information Sciences and Systems (CISS), 2013
Abstract

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We show that for any prior distribution, the Thompson Sampling strategy achieves a Bayesian regret bounded from above by 14nK14 \sqrt{n K}. This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by 1/20nK1/20 \sqrt{n K}.

View on arXiv
Comments on this paper