50
0

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm (Fed-UCBVI\texttt{Fed-UCBVI}), a novel extension of the UCBVI\texttt{UCBVI} algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of Fed-UCBVI\texttt{Fed-UCBVI} scales as O~(H3SAT/M)\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M}), with a small additional term due to heterogeneity, where S|\mathcal{S}| is the number of states, A|\mathcal{A}| is the number of actions, HH is the episode length, MM is the number of agents, and TT is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, Fed-UCBVI\texttt{Fed-UCBVI} has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, Fed-UCBVI\texttt{Fed-UCBVI}'s communication complexity only marginally increases with the number of agents.

View on arXiv
Comments on this paper