Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games

6 June 2025

Main:40 Pages

5 Figures

Bibliography:5 Pages

Appendix:1 Pages

Abstract

Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player's policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove linear convergence of the policy gradient steps to best-response policies and establish global convergence of the overall algorithm to the Nash equilibrium. The analysis relies on novel landscape characterizations over infinite-dimensional policy spaces. Numerical experiments demonstrate the convergence and robustness of the proposed algorithm under varying graphon structures, noise levels, and action frequencies.

View on arXiv

@article{plank2025_2506.05894,
  title={ Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games },
  author={ Philipp Plank and Yufei Zhang },
  journal={arXiv preprint arXiv:2506.05894},
  year={ 2025 }
}

Comments on this paper