Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.06362
Cited By
v1
v2 (latest)
Contextual Dueling Bandits
23 February 2015
Miroslav Dudík
Katja Hofmann
Robert Schapire
Aleksandrs Slivkins
M. Zoghi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Contextual Dueling Bandits"
50 / 96 papers shown
Offline Clustering of Preference Learning with Active-data Augmentation
Jingyuan Liu
Fatemeh Ghaffari
Xuchuang Wang
Xutong Liu
Mohammad Hajiesmaili
Carlee Joe-Wong
OffRL
280
0
0
30 Oct 2025
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
148
2
0
28 Oct 2025
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
Joongkyu Lee
Seouh-won Yi
Min-hwan Oh
OffRL
232
0
0
21 Oct 2025
Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making
Larkin Liu
Jalal Etesami
206
0
0
19 Oct 2025
A-IPO: Adaptive Intent-driven Preference Optimization
Wenqing Wang
Muhammad Asif Ali
Ali Shoker
Ruohan Yang
Junyang Chen
Ying Sha
Huan Wang
144
1
0
11 Oct 2025
Recycling History: Efficient Recommendations from Contextual Dueling Bandits
Suryanarayana Sankagiri
Jalal Etesami
Pouria Fatemi
Matthias Grossglauser
161
0
0
26 Aug 2025
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
Guofu Xie
Yunsheng Shi
Hongtao Tian
Ting Yao
Xiao Zhang
OffRL
LRM
599
2
0
04 Aug 2025
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Zixuan Huang
Yikun Ban
Lean Fu
Xiaojie Li
Zhongxiang Dai
Jianxin Li
Deqing Wang
434
2
0
08 Jun 2025
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Youngmin Oh
J. Park
Taejin Paik
Jaemin Park
273
1
0
02 Jun 2025
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Aya Kayal
Sattar Vakili
Laura Toni
Da-shan Shiu
A. Bernacchia
215
1
0
29 May 2025
Proximal Point Nash Learning from Human Feedback
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
274
4
0
26 May 2025
Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits
Shang Lu
Shuji Kijima
328
0
0
08 May 2025
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
473
12
0
29 Apr 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Emiliano Penaloza
Tianyue H. Zhan
Laurent Charlin
Mateo Espinosa Zarlenga
711
4
0
25 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Conference on Learning for Dynamics & Control (L4DC), 2025
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
600
0
0
20 Apr 2025
VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
Anukriti Singh
Amisha Bhaskar
Peihong Yu
Souradip Chakraborty
Ruthwik Dasyam
Amrit Singh Bedi
Erfaun Noorani
387
7
0
18 Mar 2025
Cost-Aware Optimal Pairwise Pure Exploration
International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Di Wu
Chengshuai Shi
Ruida Zhou
Cong Shen
313
0
0
10 Mar 2025
Towards a Sharp Analysis of Offline Policy Learning for
f
f
f
-Divergence-Regularized Contextual Bandits
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
495
0
0
09 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
699
18
0
07 Nov 2024
Sample-Efficient Alignment for LLMs
Zichen Liu
Changyu Chen
Chao Du
Wee Sun Lee
Min Lin
315
14
0
03 Nov 2024
Adaptive Segment-level Reward: Bridging the Gap Between Action and Reward Space in Alignment
Yongbin Li
Shaopan Xiong
Gengru Chen
Xiaoyang Li
Yijia Luo
Xingyao Zhang
Yanwen Huang
Xingyuan Bu
Yingshui Tan
148
0
0
23 Oct 2024
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
558
21
0
22 Oct 2024
Accelerated Preference Optimization for Large Language Model Alignment
Jiafan He
Huizhuo Yuan
Q. Gu
248
6
0
08 Oct 2024
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
International Conference on Learning Representations (ICLR), 2024
Efstathia Soufleri
Ujwal Dinesha
Debajoy Mukherjee
Jian Li
Srinivas Shakkottai
375
2
0
07 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
International Conference on Learning Representations (ICLR), 2024
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
515
21
0
06 Oct 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
537
3
0
03 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
208
3
0
01 Oct 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Han Xia
Songyang Gao
Qiming Ge
Zhiheng Xi
Qi Zhang
Xuanjing Huang
283
12
0
27 Aug 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
453
3
0
26 Aug 2024
Conversational Dueling Bandits in Generalized Linear Models
Shuhua Yang
Hui Yuan
Xiaoying Zhang
Mengdi Wang
Kuanqi Cai
Huazheng Wang
207
4
0
26 Jul 2024
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
448
6
0
24 Jun 2024
Adversarial Multi-dueling Bandits
Pratik Gajane
242
1
0
18 Jun 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
804
4
0
13 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
464
3
0
11 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
387
12
0
06 Jun 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Neural Information Processing Systems (NeurIPS), 2024
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
301
6
0
05 May 2024
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
680
229
0
01 May 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
456
69
0
25 Apr 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
513
5
0
16 Apr 2024
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
542
36
0
12 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
260
17
0
09 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
573
171
0
04 Apr 2024
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
Aadirupa Saha
Hilal Asi
347
1
0
22 Mar 2024
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Huiying Zhong
Zhun Deng
Weijie J. Su
Zhiwei Steven Wu
Linjun Zhang
268
27
0
08 Mar 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
524
38
0
14 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Neural Information Processing Systems (NeurIPS), 2024
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
337
34
0
11 Feb 2024
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
331
14
0
08 Feb 2024
Efficient Exploration for LLMs
Vikranth Dwaracherla
S. Asghari
Botao Hao
Benjamin Van Roy
LLMAG
516
43
0
01 Feb 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
International Conference on Machine Learning (ICML), 2024
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
642
144
0
08 Jan 2024
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
Rohan Deb
Aadirupa Saha
248
0
0
28 Dec 2023
1
2
Next
Page 1 of 2