A note on the Bayesian regret of Thompson Sampling with an arbitrary
prior
Annual Conference on Information Sciences and Systems (CISS), 2013
Abstract
We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We show that for any prior distribution, the Thompson Sampling strategy achieves a Bayesian regret bounded from above by . This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by .
View on arXivComments on this paper
