Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

8 May 2025

Abstract

Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all $n$ items are indispensable in a given win-lose relation. Then, we provide upper and lower bounds of the sample complexity of the identification problem in terms of the determinant of $A$ and a solution of $\mathbf{x}^{\top} A = \mathbf{0}^{\top}$ where $A$ is an $n \times n$ pay-off matrix that every duel follows.

View on arXiv

@article{lu2025_2505.05014,
  title={ Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits },
  author={ Shang Lu and Shuji Kijima },
  journal={arXiv preprint arXiv:2505.05014},
  year={ 2025 }
}

Comments on this paper