ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.09677
6
27

Approximate exploitability: Learning a best response in large games

20 April 2020
Finbarr Timbers
Nolan Bard
Edward Lockhart
Marc Lanctot
Martin Schmid
Neil Burch
Julian Schrittwieser
Thomas Hubert
Michael Bowling
    AAML
ArXivPDFHTML
Abstract

Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.

View on arXiv
Comments on this paper