Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.06385
Cited By
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
11 August 2023
Víctor Gallego
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF"
8 / 8 papers shown
Title
Improving Contrastive Learning of Sentence Embeddings from AI Feedback
M. Abouheaf
W. Gueaieb
Md. Suruz Miah
D. Spinello
Xipeng Qiu
42
37
0
03 May 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
57
53
0
27 Sep 2022
Personalizing Text-to-Image Generation via Aesthetic Gradients
Víctor Gallego
36
16
0
25 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Alex Tamkin
Miles Brundage
Jack Clark
Deep Ganguli
AILaw
ELM
192
248
0
04 Feb 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1