Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning

IEEE Transactions on Games (IEEE Trans. Games), 2020

3 December 2020

Abstract

Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large-scale Imperfect Information Games (IIGs). Neural CFR is one of the promising techniques that can effectively reduce computation and memory consumption by generalizing decision information between similar states. However, current neural CFR algorithms have to approximate the cumulative regrets with neural networks. This usually results in high-variance approximation because regrets from different iterations could be very different. The problem can be even worse when importance sampling is used, which is required for model-free algorithms. In this paper, a new CFR variant, Recursive CFR, is proposed, in which the cumulative regrets are recovered by Recursive Substitute Values (RSVs) that are recursively defined and independently calculated between iterations. It is proved the new Recursive CFR converges to a Nash equilibrium. Based on Recursive CFR, a new model-free neural CFR algorithm with bootstrap learning is proposed. Experimental results show that the new algorithm can match the state-of-the-art neural CFR algorithms but with less training overhead.

View on arXiv

Comments on this paper