ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.15378
14
186

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

30 June 2022
Julien Perolat
Bart De Vylder
Daniel Hennes
Eugene Tarassov
Florian Strub
V. D. Boer
Paul Muller
Jerome T. Connor
Neil Burch
Thomas W. Anthony
Stephen Marcus McAleer
Romuald Elie
Sarah H. Cen
Zhe Wang
A. Gruslys
Aleksandra Malysheva
Mina Khan
Sherjil Ozair
Finbarr Timbers
Tobias Pohlen
Tom Eccles
Mark Rowland
Marc Lanctot
Jean-Baptiste Lespiau
Bilal Piot
Shayegan Omidshafiei
Edward Lockhart
Laurent Sifre
Nathalie Beauguerlange
Rémi Munos
David Silver
Satinder Singh
Demis Hassabis
K. Tuyls
ArXivPDFHTML
Abstract

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of 1053510^{535}10535 nodes, i.e., 1017510^{175}10175 times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas holdém poker, which has a significantly smaller game tree (on the order of 1016410^{164}10164 nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

View on arXiv
Comments on this paper