30
102

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

Abstract

We consider a linear stochastic bandit problem where the dimension KK of the unknown parameter θ\theta is larger than the sampling budget nn. In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in O(Kn)O(K\sqrt{n}). In this paper we assume that θ\theta is SS-sparse, i.e. has at most SS-non-zero components, and that the space of arms is the unit ball for the .2||.||_2 norm. We combine ideas from Compressed Sensing and Bandit Theory and derive algorithms with regret bounds in O(Sn)O(S\sqrt{n}).

View on arXiv
Comments on this paper