We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTG). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be -NE for the finite population game where is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRNPG), where each team minimizes its cumulative cost \emph{independently} in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which \emph{independent natural policy gradient} is shown to exhibit linear convergence under time-independent diagonal dominance. Numerical studies included corroborate the theoretical results.
View on arXiv@article{zaman2025_2403.11345, title={ Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective }, author={ Muhammad Aneeq uz Zaman and Alec Koppel and Mathieu Laurière and Tamer Başar }, journal={arXiv preprint arXiv:2403.11345}, year={ 2025 } }