ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05632
34
2

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

8 March 2024
Hongyi Guo
Zhihan Liu
Yufeng Zhang
Zhaoran Wang
    LRM
ArXivPDFHTML
Abstract

Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge. While LLMs have proven beneficial as decision-making aids, their reliability is hampered by limitations in reasoning, hallucination phenomenon, and so on. On the other hand, Monte-Carlo Tree Search (MCTS) is a heuristic search algorithm that provides reliable decision-making solutions, achieved through recursive rollouts and self-play. However, the effectiveness of MCTS relies heavily on heuristic pruning and external value functions, particularly in complex decision scenarios. This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve deterministic turn-based zero-sum games (DTZG), such as chess and go, without the need for additional training. Specifically, we utilize LLMs as both action pruners and proxies for value functions without the need for additional training. We theoretically prove that the suboptimality of the estimated value in our proposed method scales with O~(∣A~∣N+ϵpruner+ϵcritic)\tilde{\mathcal O}\Bigl(\frac{|\tilde {\mathcal A}|}{\sqrt{N}} + \epsilon_\mathrm{pruner} + \epsilon_\mathrm{critic}\Bigr)O~(N​∣A~∣​+ϵpruner​+ϵcritic​), where \(N\) is the number of simulations, ∣A~∣|\tilde {\mathcal A}|∣A~∣ is the cardinality of the pruned action space by LLM, and ϵpruner\epsilon_\mathrm{pruner}ϵpruner​ and ϵcritic\epsilon_\mathrm{critic}ϵcritic​ quantify the errors incurred by adopting LLMs as action space pruner and value function proxy, respectively. Our experiments in chess and go demonstrate the capability of our method to address challenges beyond the scope of MCTS and improve the performance of the directly application of LLMs.

View on arXiv
Comments on this paper