ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.04044
21
10

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

8 June 2022
Yuling Yan
Gen Li
Yuxin Chen
Jianqing Fan
    OffRL
ArXivPDFHTML
Abstract

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a γ\gammaγ-discounted infinite-horizon Markov game with SSS states, where the max-player has AAA actions and the min-player has BBB actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an ε\varepsilonε-approximate Nash equilibrium with a sample complexity no larger than Cclipped⋆S(A+B)(1−γ)3ε2\frac{C_{\mathsf{clipped}}^{\star}S(A+B)}{(1-\gamma)^{3}\varepsilon^{2}}(1−γ)3ε2Cclipped⋆​S(A+B)​ (up to some log factor). Here, Cclipped⋆C_{\mathsf{clipped}}^{\star}Cclipped⋆​ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy ε\varepsilonε can be any value within (0,11−γ]\big(0,\frac{1}{1-\gamma}\big](0,1−γ1​]. Our sample complexity bound strengthens prior art by a factor of min⁡{A,B}\min\{A,B\}min{A,B}, achieving minimax optimality for the entire ε\varepsilonε-range. An appealing feature of our result lies in algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.

View on arXiv
Comments on this paper