Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all items are indispensable in a given win-lose relation. Then, we provide upper and lower bounds of the sample complexity of the identification problem in terms of the determinant of and a solution of where is an pay-off matrix that every duel follows.
View on arXiv@article{lu2025_2505.05014, title={ Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits }, author={ Shang Lu and Shuji Kijima }, journal={arXiv preprint arXiv:2505.05014}, year={ 2025 } }