Fast Thompson Sampling with Cumulative Oversampling: Application to Budgeted Influence Maximization

24 April 2020

Abstract

We propose a cumulative oversampling (CO) technique for Thompson Sampling (TS) to construct optimistic parameter estimates with significantly fewer samples than existing oversampling frameworks. We apply CO to a novel budgeted variant of the Influence Maximization (IM) semi-bandits with linear generalization of edge weights. Combining CO with the oracle we design for the offline problem, our online learning algorithm simultaneously tackles budget allocation, parameter learning, and reward maximization. We show that for IM semi-bandits, our TS-based algorithm achieves a scaled regret comparable to that of the best UCB-based algorithms while significantly outperforming UCB-based alternatives in numerical experiments. Before this work, TS-based algorithms for IM semi-bandits had larger regret bounds that were linearly dependent on the reciprocal of the minimum observation probability of an edge.

View on arXiv

Comments on this paper