ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.08816
19
11

Borda Regret Minimization for Generalized Linear Dueling Bandits

15 March 2023
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
ArXivPDFHTML
Abstract

Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. We first prove a regret lower bound of order Ω(d2/3T2/3)\Omega(d^{2/3} T^{2/3})Ω(d2/3T2/3) for the Borda regret minimization problem, where ddd is the dimension of contextual vectors and TTT is the time horizon. To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound O~(d2/3T2/3)\tilde{O}(d^{2/3} T^{2/3})O~(d2/3T2/3). We also propose an EXP3-type algorithm for the adversarial linear setting, where the underlying model parameter can change at each round. Our algorithm achieves an O~(d2/3T2/3)\tilde{O}(d^{2/3} T^{2/3})O~(d2/3T2/3) regret, which is also optimal. Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis.

View on arXiv
Comments on this paper