Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.03319
Cited By
Leveraging Demonstrations to Improve Online Learning: Quality Matters
7 February 2023
Botao Hao
Rahul Jain
Tor Lattimore
Benjamin Van Roy
Zheng Wen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leveraging Demonstrations to Improve Online Learning: Quality Matters"
7 / 7 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
25
1
0
03 Apr 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
37
2
0
13 Jun 2024
Sequential Best-Arm Identification with Application to Brain-Computer Interface
Xiaoping Zhou
Botao Hao
Jian Kang
Tor Lattimore
Lexin Li
27
2
0
17 May 2023
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee
Sean R. Sinclair
Milind Tambe
Lily Xu
C. Yu
AI4TS
29
6
0
30 Sep 2022
Regret Bounds for Information-Directed Reinforcement Learning
Botao Hao
Tor Lattimore
OffRL
39
17
0
09 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,587
0
18 Sep 2019
1