Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.08114
Cited By
Active Preference Learning for Large Language Models
12 February 2024
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Active Preference Learning for Large Language Models"
5 / 5 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
108
2
0
29 Apr 2025
Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen
Zichen Liu
Chao Du
Tianyu Pang
Qian Liu
Arunesh Sinha
Pradeep Varakantham
Min-Bin Lin
SyDa
ALM
62
23
0
14 Jun 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
230
294
0
18 Jan 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1