ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.03917
20
10

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

6 November 2021
Aadirupa Saha
Shubham Gupta
ArXivPDFHTML
Abstract

We study the problem of \emph{dynamic regret minimization} in KKK-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with O(KT)O(\sqrt{KT})O(KT​) high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish \tO(SKT)\tO(\sqrt{SKT})\tO(SKT​) and \tO(VT1/3K1/3T2/3)\tO({V_T^{1/3}K^{1/3}T^{2/3}})\tO(VT1/3​K1/3T2/3) dynamic-regret guarantees, SSS being the total number of `effective-switches' in the underlying preference relations and VTV_TVT​ being a measure of `continuous-variation' non-stationarity. The complexity of these problems have not been studied prior to this work despite the practicability of non-stationary environments in real world systems. We justify the optimality of our algorithms by proving matching lower bound guarantees under both the above-mentioned notions of non-stationarities. Finally, we corroborate our results with extensive simulations and compare the efficacy of our algorithms over state-of-the-art baselines.

View on arXiv
Comments on this paper