Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.11288
Cited By
Kernelized Offline Contextual Dueling Bandits
21 July 2023
Viraj Mehta
Ojash Neopane
Vikramjeet Das
Sen Lin
J. Schneider
W. Neiswanger
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Kernelized Offline Contextual Dueling Bandits"
6 / 6 papers shown
Title
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
20
2
0
24 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
35
4
0
06 Jun 2024
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
14
6
0
08 Feb 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
171
56
0
03 May 2018
1