Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07971
Cited By
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
12 June 2024
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF"
7 / 7 papers shown
Title
Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models
Alessio Galatolo
Zhenbang Dai
Katie Winkle
Meriem Beloucif
45
0
0
05 Mar 2025
Sycophancy in Large Language Models: Causes and Mitigations
Lars Malmqvist
56
6
0
22 Nov 2024
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
220
255
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
Dialogue Learning With Human-In-The-Loop
Jiwei Li
Alexander H. Miller
S. Chopra
MarcÁurelio Ranzato
Jason Weston
OffRL
216
132
0
29 Nov 2016
1