Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

14 February 2022

Papers citing "Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences"

5 / 5 papers shown

Title
Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents Fanzeng Xia Hao Liu Yisong Yue Tongxin Li 61 1 0 03 Jan 2025
Optimal Design for Human Feedback Subhojyoti Mukherjee Anusha Lalitha Kousha Kalantari Aniket Deshmukh Ge Liu Yifei Ma B. Kveton 36 0 0 22 Apr 2024
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy Aadirupa Saha Hilal Asi 36 1 0 22 Mar 2024
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits Pierre Gaillard Aadirupa Saha Soham Dan 16 3 0 26 Oct 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits Thomas Kleine Buening Aadirupa Saha 38 6 0 25 Oct 2022