ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15045
29
1

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

22 March 2024
Aadirupa Saha
Hilal Asi
ArXivPDFHTML
Abstract

We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy. We consider a general class of utility-based preference matrices for large (potentially unbounded) decision spaces and give the first differentially private dueling bandit algorithm for active learning with user preferences. Our proposed algorithms are computationally efficient with near-optimal performance, both in terms of the private and non-private regret bound. More precisely, we show that when the decision space is of finite size KKK, our proposed algorithm yields order optimal O(∑i=2Klog⁡KTΔi+Kϵ)O\Big(\sum_{i = 2}^K\log\frac{KT}{\Delta_i} + \frac{K}{\epsilon}\Big)O(∑i=2K​logΔi​KT​+ϵK​) regret bound for pure ϵ\epsilonϵ-DP, where Δi\Delta_iΔi​ denotes the suboptimality gap of the iii-th arm. We also present a matching lower bound analysis which proves the optimality of our algorithms. Finally, we extend our results to any general decision space in ddd-dimensions with potentially infinite arms and design an ϵ\epsilonϵ-DP algorithm with regret O~(d6κϵ+dTκ)\tilde{O} \left( \frac{d^6}{\kappa \epsilon } + \frac{ d\sqrt{T }}{\kappa} \right)O~(κϵd6​+κdT​​), providing privacy for free when T≫dT \gg dT≫d.

View on arXiv
Comments on this paper