To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Consequently, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, the empirical convergence rates of the algorithms in these studies are suboptimal, since they do not utilize Regret Matching (RM)-based CFR algorithms to solve perturbed EFGs, which are known the exceptionally fast empirical convergence rates. Additionally, since solving multiple perturbed regularized EFGs is required, fine-tuning across all such games is infeasible, making parameter-free algorithms highly desirable. In this paper, we prove that CFR, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. Leveraging CFR to solve perturbed regularized EFGs, we get Reward Transformation CFR (RTCFR). Importantly, we extend prior work on the parameter-free property of CFR, enhancing its stability, which is crucial for the empirical convergence of RTCFR. Experiments show that RTCFR significantly outperforms existing algorithms with theoretical last-iterate convergence guarantees.
View on arXiv@article{meng2025_2308.11256, title={ Efficient Last-iterate Convergence Algorithms in Solving Games }, author={ Linjian Meng and Youzhi Zhang and Zhenxing Ge and Shangdong Yang and Tianyu Ding and Wenbin Li and Tianpei Yang and Bo An and Yang Gao }, journal={arXiv preprint arXiv:2308.11256}, year={ 2025 } }