Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.12306
Cited By
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
24 November 2021
Aadirupa Saha
A. Krishnamurthy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability"
32 / 32 papers shown
Title
Active Human Feedback Collection via Neural Contextual Dueling Bandits
Arun Verma
Xiaoqiang Lin
Zhongxiang Dai
Daniela Rus
Bryan Kian Hsiang Low
42
0
0
16 Apr 2025
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
85
0
0
04 Feb 2025
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
Guojun Xiong
Ujwal Dinesha
Debajoy Mukherjee
Jian Li
Srinivas Shakkottai
52
2
0
07 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
81
3
0
06 Oct 2024
Conversational Dueling Bandits in Generalized Linear Models
Shuhua Yang
Hui Yuan
Xiaoying Zhang
Mengdi Wang
Hong Zhang
Huazheng Wang
41
1
0
26 Jul 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
39
5
0
24 Jul 2024
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
42
2
0
24 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
45
4
0
06 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
49
0
0
05 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
31
1
0
16 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
44
9
0
09 Apr 2024
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
Aadirupa Saha
Hilal Asi
36
1
0
22 Mar 2024
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
Xinglin Zhou
Yifu Yuan
Shaofu Yang
Jianye Hao
39
1
0
22 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
26
17
0
14 Feb 2024
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
32
6
0
08 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
36
25
0
29 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
51
96
0
08 Jan 2024
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
Rohan Deb
Aadirupa Saha
38
0
0
28 Dec 2023
Faster Convergence with Multiway Preferences
Aadirupa Saha
Vitaly Feldman
Tomer Koren
Yishay Mansour
28
1
0
19 Dec 2023
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
Zihao Li
Xiang Ji
Minshuo Chen
Mengdi Wang
OffRL
46
0
0
16 Oct 2023
Identifying Copeland Winners in Dueling Bandits with Indifferences
Viktor Bengs
Björn Haddenhorst
Eyke Hüllermeier
53
0
0
01 Oct 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
39
55
0
29 May 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
184
0
26 Jan 2023
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
Pierre Gaillard
Aadirupa Saha
Soham Dan
30
3
0
26 Oct 2022
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
Thomas Kleine Buening
Aadirupa Saha
48
6
0
25 Oct 2022
Dueling Convex Optimization with General Preferences
Aadirupa Saha
Tomer Koren
Yishay Mansour
30
2
0
27 Sep 2022
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo G. Carranza
Sanath Kumar Krishnamurthy
Susan Athey
21
1
0
30 Mar 2022
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Aadirupa Saha
Pierre Gaillard
38
8
0
14 Feb 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
35
82
0
08 Nov 2021
Optimal Dynamic Regret in Exp-Concave Online Learning
Dheeraj Baby
Yu Wang
50
44
0
23 Apr 2021
1