Learning Zero-sum Stochastic Games with Posterior Sampling
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Abstract
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here is an upper bound on the span of the bias function, is the number of states, is the number of joint actions and is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. This improves the best existing regret bound of by Wei et. al., 2017 under the same assumption and matches the theoretical lower bound in and .
View on arXivComments on this paper
