Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value into individual Q-values to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between and and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual s into . In this paper, we theoretically derive a general formula of in terms of , based on which we can naturally implement a multi-head attention formation to approximate , resulting in not only a refined representation of with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.
View on arXiv