Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games
Algorithms based on regret matching, specifically regret matching (RM), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM. First, we show numerically that several practical variants such as simultaneous RM, alternating RM, and simultaneous predictive RM, all lack last-iterate convergence guarantees even on a simple game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM and smooth Predictive RM enjoy asymptotic last-iterate convergence (without a rate) and best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.
View on arXiv