Learning Adversarially Robust Policies in Multi-Agent Games

10 June 2021

Eric Zhao

Abstract

Robust decision-making in multiplayer games requires anticipating what reactions a player policy may elicit from other players. This is difficult in games with three or more players: when one player fixes a policy, it induces a subgame between the remaining players with many possible equilibria outcomes. Predicting the worst-case outcome of a policy is thus an equilibrium selection problem -- one known to be generally NP-Hard. We show that worst-case coarse-correlated equilibria can be efficiently approximated in smooth games and propose a framework that uses the worst-case evaluation scheme to learn robust player policies. We further prove the framework can be extended to handle uncertainty about the bounded rationality of other players. In experiments, our framework learns robust policies in repeated N-player matrix games and, when applied to deep multi-agent reinforcement learning, can scale to complex spatiotemporal games. For example, it learns robust AI tax policies that improve welfare by up to 15%, even when taxpayers are boundedly rational.

View on arXiv

Comments on this paper