Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.17112
Cited By
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
24 July 2024
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
P. Jaillet
K. H. Low
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Dueling Bandits: Preference-Based Optimization with Human Feedback"
7 / 7 papers shown
Title
Neural Logistic Bandits
Seoungbin Bae
Dabeen Lee
41
0
0
04 May 2025
Quantum Lipschitz Bandits
Bongsoo Yi
Yue Kang
Yao Li
25
1
0
03 Apr 2025
Online Clustering of Dueling Bandits
Zhiyong Wang
Jiahang Sun
Mingze Kong
Jize Xie
Qinghua Hu
J. C. Lui
Zhongxiang Dai
80
0
0
04 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
21
1
0
16 Apr 2024
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta
Vikramjeet Das
Ojash Neopane
Yijia Dai
Ilija Bogunovic
Ilija Bogunovic
W. Neiswanger
Stefano Ermon
Jeff Schneider
Willie Neiswanger
OffRL
25
12
0
01 Dec 2023
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
226
255
0
21 Mar 2022
1