28
6

Thompson Sampling for (Combinatorial) Pure Exploration

Jun Zhu
Abstract

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds within arm set SS to represent the upper confidence bound of SS, which can be much larger than the tight upper confidence bound of SS and leads to a much higher complexity than necessary, since the empirical means of different arms in SS are independent. To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. In TS-Explore, the sum of independent random samples within arm set SS will not exceed the tight upper confidence bound of SS with high probability. Hence it solves the above challenge, and achieves a lower complexity upper bound than existing efficient UCB-based algorithms in general combinatorial pure exploration. As for pure exploration of classic multi-armed bandit, we show that TS-Explore achieves an asymptotically optimal complexity upper bound.

View on arXiv
Comments on this paper