26
32

Multi-Agent Multi-Armed Bandits with Limited Communication

Abstract

We consider the problem where NN agents collaboratively interact with an instance of a stochastic KK arm bandit problem for KNK \gg N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of TT time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of O~((K/N+N)T)\tilde{O}\left(\sqrt{({K/N}+ N)T}\right), communicates for O(logT)O(\log T) steps and broadcasts O(logK)O(\log K) bits in each communication step. We extend the work to sparse graphs with maximum degree KGK_G, and diameter DD and propose LCC-UCB-GRAPH which enjoys a regret bound of O~(D(K/N+KG)DT)\tilde{O}\left(D\sqrt{(K/N+ K_G)DT}\right). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node

View on arXiv
Comments on this paper