113
77

Federated Bandit: A Gossiping Approach

Zhaowei Zhu
Jingxuan Zhu
Ji Liu
Yang Liu
Abstract

In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of NN agents, who can only communicate their local data with neighbors described by a connected graph GG. Each agent makes a sequence of decisions on selecting an arm from MM candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(max{poly(N,M)logT,poly(N,M)logλ21N})O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{\lambda_2^{-1}} N\}) for all NN agents, where λ2(0,1)\lambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of GG. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve ϵ\epsilon-differential privacy of their local data while achieving O(max{poly(N,M)ϵlog2.5T,poly(N,M)(logλ21N+logT)})O(\max \{\frac{\texttt{poly}(N,M)}{\epsilon}\log^{2.5} T, \texttt{poly}(N,M) (\log_{\lambda_2^{-1}} N + \log T) \}) regret.

View on arXiv
Comments on this paper