38
0

Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

Abstract

Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all nn items are indispensable in a given win-lose relation. Then, we provide upper and lower bounds of the sample complexity of the identification problem in terms of the determinant of AA and a solution of xA=0\mathbf{x}^{\top} A = \mathbf{0}^{\top} where AA is an n×nn \times n pay-off matrix that every duel follows.

View on arXiv
@article{lu2025_2505.05014,
  title={ Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits },
  author={ Shang Lu and Shuji Kijima },
  journal={arXiv preprint arXiv:2505.05014},
  year={ 2025 }
}
Comments on this paper