Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.10776
Cited By
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
16 April 2024
Qiwei Di
Jiafan He
Quanquan Gu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback"
6 / 6 papers shown
Title
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
77
44
0
31 Dec 2024
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
Yuwei Cheng
Fan Yao
Xuefeng Liu
Haifeng Xu
22
1
0
18 May 2024
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
16
11
0
15 Mar 2023
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He
Dongruo Zhou
Tong Zhang
Quanquan Gu
61
46
0
13 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
Marc Abeille
Louis Faury
Clément Calauzènes
96
37
0
23 Oct 2020
1