273

Dynamics of Softmax Q-Learning in Two-Player Two-Action Games

Abstract

We consider the dynamics of Q-learning in two-player two-action games with Boltzmann exploration mechanism. For any non-zero exploration rate the dynamics is dissipative, which guarantees that agent strategies converge to rest points that are generally different from the game's Nash Equlibria (NE). We provide a comprehensive characterization of the rest point structure for different games, and examine the sensitivity of this structure with respect to the noise due to exploration. Our results indicate that for a class of games with multiple NE the asymptotic behavior of learning dynamics can undergo drastic changes at a critical exploration rate. A somewhat counterintuitive manifestation of this behavior is that increasing noise might lead the agents to select a more optimal solution.

View on arXiv
Comments on this paper