Q-based Equilibria

25 April 2023

Olivier Compte

ArXiv (abs)PDF HTML

Main:32 Pages

21 Figures

Bibliography:3 Pages

16 Tables

Appendix:6 Pages

Abstract

In dynamic environments, Q-learning is an adaptative rule that provides an estimate (a Q-value) of the continuation value associated with each alternative. A naive policy consists in always choosing the alternative with highest Q-value. We consider a family of Q-based policy rules that may systematically favor some alternatives over others, for example rules that incorporate a leniency bias that favors cooperation. In the spirit of Compte and Postlewaite [2018], we look for equilibrium biases (or Qb-equilibria) within this family of Q-based rules. We examine classic games under various monitoring technologies.

View on arXiv

Comments on this paper