We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a -dimensional stochastic linear bandit with an interaction horizon , ensemble sampling with an ensemble of size of order incurs regret at most of the order . Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with -- which defeats the purpose of ensemble sampling -- while obtaining near order regret. Our result is also the first to allow for infinite action sets.
View on arXiv