Stochastic Games with Limited Public Memory

5 May 2025

Main:20 Pages

Bibliography:3 Pages

Abstract

We study the memory resources required for near-optimal play in two-player zero-sum stochastic games with the long-run average payoff. Although optimal strategies may not exist in such games, near-optimal strategies always do.Mertens and Neyman (1981) proved that in any stochastic game, for any $\varepsilon>0$ , there exist uniform $\varepsilon$ -optimal memory-based strategies -- i.e., strategies that are $\varepsilon$ -optimal in all sufficiently long $n$ -stage games -- that use at most $O(n)$ memory states within the first $n$ stages. We improve this bound on the number of memory states by proving that in any stochastic game, for any $\varepsilon>0$ , there exist uniform $\varepsilon$ -optimal memory-based strategies that use at most $O(\log n)$ memory states in the first $n$ stages. Moreover, we establish the existence of uniform $\varepsilon$ -optimal memory-based strategies whose memory updating and action selection are time-independent and such that, with probability close to 1, for all $n$ , the number of memory states used up to stage $n$ is at most $O(\log n)$ .This result cannot be extended to strategies with bounded public memory -- even if time-dependent memory updating and action selection are allowed. This impossibility is illustrated in the Big Match -- a well-known stochastic game where the stage payoffs to Player 1 are 0 or 1. Although for any $\varepsilon > 0$ , there exist strategies of Player 1 that guarantee a payoff {exceeding} $1/2 - \varepsilon$ in all sufficiently long $n$ -stage games, we show that any strategy of Player 1 that uses a finite public memory fails to guarantee a payoff greater than $\varepsilon$ in any sufficiently long $n$ -stage game.

View on arXiv

@article{hansen2025_2505.02623,
  title={ Stochastic Games with Limited Public Memory },
  author={ Kristoffer Arnsfelt Hansen and Rasmus Ibsen-Jensen and Abraham Neyman },
  journal={arXiv preprint arXiv:2505.02623},
  year={ 2025 }
}

Comments on this paper