65

Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Main:34 Pages
Bibliography:4 Pages
1 Tables
Abstract

Multi-agent robust reinforcement learning, also known as multi-player robust Markov games (RMGs), is a crucial framework for modeling competitive interactions under environmental uncertainties, with wide applications in multi-agent systems. However, existing results on sample complexity in RMGs suffer from at least one of three obstacles: restrictive range of uncertainty level or accuracy, the curse of multiple agents, and the barrier of long horizons, all of which cause existing results to significantly exceed the information-theoretic lower bound. To close this gap, we extend the Q-FTRL algorithm \citep{li2022minimax} to the RMGs in finite-horizon setting, assuming access to a generative model. We prove that the proposed algorithm achieves an ε\varepsilon-robust coarse correlated equilibrium (CCE) with a sample complexity (up to log factors) of O~(H3Si=1mAimin{H,1/R}/ε2)\widetilde{O}\left(H^3S\sum_{i=1}^mA_i\min\left\{H,1/R\right\}/\varepsilon^2\right), where SS denotes the number of states, AiA_i is the number of actions of the ii-th agent, HH is the finite horizon length, and RR is uncertainty level. We also show that this sample compelxity is minimax optimal by combining an information-theoretic lower bound. Additionally, in the special case of two-player zero-sum RMGs, the algorithm achieves an ε\varepsilon-robust Nash equilibrium (NE) with the same sample complexity.

View on arXiv
Comments on this paper