ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.03100
33
6

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

3 March 2023
Zaiwei Chen
K. Zhang
Eric Mazumdar
Asuman Ozdaglar
Adam Wierman
ArXivPDFHTML
Abstract

We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known O~(1/ϵ2)\tilde{\mathcal{O}}(1/\epsilon^2)O~(1/ϵ2) sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper O~(1/ϵ)\tilde{\mathcal{O}}(1/\epsilon)O~(1/ϵ) sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

View on arXiv
Comments on this paper