ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF

11 August 2023

Papers citing "ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF"

8 / 8 papers shown

Title
Improving Contrastive Learning of Sentence Embeddings from AI Feedback M. Abouheaf W. Gueaieb Md. Suruz Miah D. Spinello Xipeng Qiu 39 37 0 03 May 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 222 495 0 28 Sep 2022
Defining and Characterizing Reward Hacking Joar Skalse Nikolaus H. R. Howe Dmitrii Krasheninnikov David M. Krueger 57 53 0 27 Sep 2022
Personalizing Text-to-Image Generation via Aesthetic Gradients Víctor Gallego 34 16 0 25 Sep 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 203 1,651 0 15 Oct 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models Alex Tamkin Miles Brundage Jack Clark Deep Ganguli AILaw ELM 192 248 0 04 Feb 2021
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019