324

Network of Bandits

Abstract

The distribution of the best arm identification task on the user's devices offers several advantages for application purposes: scalability, reduction of deployment costs and privacy. We propose a distributed version of the algorithm Successive Elimination using a simple architecture based on a single server which synchronizes each task executed on the user's devices. We show that this algorithm is optimal in terms of transmitted number of bits and is optimal up to logarithmic factors in terms to number of pulls per player. Finally, we propose an extension of this approach to distribute the contextual bandit algorithm Bandit Forest, which is able to finely exploit the user's data while guaranteeing the privacy.

View on arXiv
Comments on this paper