ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15935
20
2

Sample and Communication Efficient Fully Decentralized MARL Policy Evaluation via a New Approach: Local TD update

23 March 2024
Fnu Hairi
Zifan Zhang
Jia-Wei Liu
ArXivPDFHTML
Abstract

In actor-critic framework for fully decentralized multi-agent reinforcement learning (MARL), one of the key components is the MARL policy evaluation (PE) problem, where a set of NNN agents work cooperatively to evaluate the value function of the global states for a given policy through communicating with their neighbors. In MARL-PE, a critical challenge is how to lower the sample and communication complexities, which are defined as the number of training samples and communication rounds needed to converge to some ϵ\epsilonϵ-stationary point. To lower communication complexity in MARL-PE, a "natural'' idea is to perform multiple local TD-update steps between each consecutive rounds of communication to reduce the communication frequency. However, the validity of the local TD-update approach remains unclear due to the potential "agent-drift'' phenomenon resulting from heterogeneous rewards across agents in general. This leads to an interesting open question: Can the local TD-update approach entail low sample and communication complexities? In this paper, we make the first attempt to answer this fundamental question. We focus on the setting of MARL-PE with average reward, which is motivated by many multi-agent network optimization problems. Our theoretical and experimental results confirm that allowing multiple local TD-update steps is indeed an effective approach in lowering the sample and communication complexities of MARL-PE compared to consensus-based MARL-PE algorithms. Specifically, the local TD-update steps between two consecutive communication rounds can be as large as O(1/ϵ1/2log⁡(1/ϵ))\mathcal{O}(1/\epsilon^{1/2}\log{(1/\epsilon)})O(1/ϵ1/2log(1/ϵ)) in order to converge to an ϵ\epsilonϵ-stationary point of MARL-PE. Moreover, we show theoretically that in order to reach the optimal sample complexity, the communication complexity of local TD-update approach is O(1/ϵ1/2log⁡(1/ϵ))\mathcal{O}(1/\epsilon^{1/2}\log{(1/\epsilon)})O(1/ϵ1/2log(1/ϵ)).

View on arXiv
Comments on this paper