11
184

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Abstract

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value QtotQ_{tot} into individual Q-values QiQ^{i} to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between QtotQ_{tot} and QiQ^{i} and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual QiQ^{i}s into QtotQ_{tot}. In this paper, we theoretically derive a general formula of QtotQ_{tot} in terms of QiQ^{i}, based on which we can naturally implement a multi-head attention formation to approximate QtotQ_{tot}, resulting in not only a refined representation of QtotQ_{tot} with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

View on arXiv
Comments on this paper