ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.10740
42
9

Communication Efficient Parallel Reinforcement Learning

22 February 2021
Mridul Agarwal
Bhargav Ganguly
Vaneet Aggarwal
ArXivPDFHTML
Abstract

We consider the problem where MMM agents interact with MMM identical and independent environments with SSS states and AAA actions using reinforcement learning for TTT rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each agent and prove that the total cumulative regret of MMM agents is upper bounded as \TildeO(DSMAT)\Tilde{O}(DS\sqrt{MAT})\TildeO(DSMAT​) for a Markov Decision Process with diameter DDD, number of states SSS, and number of actions AAA. The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of O(MSAlog⁡(MT))O\left(MSA\log(MT)\right)O(MSAlog(MT)) on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that the proposed algorithm performs at par with an always communication version of the UCRL2 algorithm, while with significantly lower communication.

View on arXiv
Comments on this paper