Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Abstract
We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve convergence rate to a correlated equilibrium and an accelerated convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.
View on arXivComments on this paper