Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.05862
Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
50 / 1,795 papers shown
Title
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
G. Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Bernard Ghanem
SyDa
ALM
19
397
0
31 Mar 2023
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
47
1,417
0
30 Mar 2023
Improving Code Generation by Training with Natural Language Feedback
Angelica Chen
Jérémy Scheurer
Tomasz Korbak
Jon Ander Campos
Jun Shern Chan
Samuel R. Bowman
Kyunghyun Cho
Ethan Perez
SyDa
ALM
AI4CE
26
76
0
28 Mar 2023
Foundation Models and Fair Use
Peter Henderson
Xuechen Li
Dan Jurafsky
Tatsunori Hashimoto
Mark A. Lemley
Percy Liang
12
119
0
28 Mar 2023
SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts
R. Reddy
Daniel Lee
Yi Ren Fung
Khanh Duy Nguyen
Qi Zeng
Manling Li
Ziqi Wang
Clare R. Voss
Heng Ji
10
5
0
25 Mar 2023
Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense
Andrei Kucharavy
Z. Schillaci
Loic Maréchal
Maxime Wursch
Ljiljana Dolamic
Remi Sabonnadiere
Dimitri Percia David
Alain Mermoud
Vincent Lenders
ELM
AI4CE
22
31
0
21 Mar 2023
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
34
760
0
20 Mar 2023
What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
Yonadav Shavit
16
21
0
20 Mar 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
27
19
0
18 Mar 2023
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou
Sam Manning
Pamela Mishkin
Daniel Rock
ELM
11
374
0
17 Mar 2023
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhen Zhang
Xinyi Yang
Yihao Feng
Can Qin
Chia-Chih Chen
...
Haiquan Wang
Silvio Savarese
Stefano Ermon
Caiming Xiong
Ran Xu
10
103
0
16 Mar 2023
Artificial Influence: An Analysis Of AI-Driven Persuasion
Matthew Burtell
T. Woodside
19
34
0
15 Mar 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
11
192
0
14 Mar 2023
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Yunjie Ji
Yan Gong
Yiping Peng
Chao Ni
Peiyan Sun
Dongyu Pan
Baochang Ma
Xiangang Li
ELM
ALM
AI4MH
17
37
0
14 Mar 2023
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
84
77
0
13 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
17
99
0
09 Mar 2023
disco: a toolkit for Distributional Control of Generative Models
Germán Kruszewski
Jos Rozen
Marc Dymetman
19
4
0
08 Mar 2023
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones
Anca Dragan
Aditi Raghunathan
Jacob Steinhardt
27
157
0
08 Mar 2023
Foundation Models for Decision Making: Problems, Methods, and Opportunities
Sherry Yang
Ofir Nachum
Yilun Du
Jason W. Wei
Pieter Abbeel
Dale Schuurmans
LM&Ro
OffRL
LRM
AI4CE
90
152
0
07 Mar 2023
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles
Zhiwei Tang
Dmitry Rybin
Tsung-Hui Chang
ALM
DiffM
31
25
0
07 Mar 2023
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback
Gabrielle K. Liu
OffRL
13
21
0
06 Mar 2023
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Daniel D. Johnson
Daniel Tarlow
Christian J. Walder
21
6
0
01 Mar 2023
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Seonghyeon Ye
Hyeonbin Hwang
Sohee Yang
Hyeongu Yun
Yireun Kim
Minjoon Seo
LRM
22
34
0
28 Feb 2023
Goal Driven Discovery of Distributional Differences via Language Descriptions
Ruiqi Zhong
Peter Zhang
Steve Li
Jinwoo Ahn
Dan Klein
Jacob Steinhardt
25
48
0
28 Feb 2023
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
25
198
0
27 Feb 2023
Safety without alignment
András Kornai
M. Bukatin
Zsolt Zombori
LLMSV
11
0
0
27 Feb 2023
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
11
249
0
23 Feb 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
26
430
0
23 Feb 2023
Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents
Bradley Butcher
Miri Zilka
Darren Cook
Jiri Hron
Adrian Weller
28
3
0
18 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao-Lun Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
21
15
0
18 Feb 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
19
15
0
17 Feb 2023
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
25
205
0
16 Feb 2023
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
24
193
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
19
68
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
26
157
0
15 Feb 2023
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
24
13
0
13 Feb 2023
Transformer models: an introduction and catalog
X. Amatriain
Ananth Sankar
Jie Bing
Praveen Kumar Bodigutla
Timothy J. Hazen
Michaeel Kazi
17
50
0
12 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
8
233
0
11 Feb 2023
Synthesizing Human Gaze Feedback for Improved NLP Performance
Varun Khurana
Yaman Kumar Singla
Nora Hollenstein
R. Kumar
Balaji Krishnamurthy
4
15
0
11 Feb 2023
Data Selection for Language Models via Importance Resampling
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Percy Liang
6
170
0
06 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
18
115
0
06 Feb 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
19
17
0
02 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
25
37
0
02 Feb 2023
Conditioning Predictive Models: Risks and Strategies
Evan Hubinger
Adam Jermyn
Johannes Treutlein
Rubi Hudson
Kate Woolverton
23
5
0
02 Feb 2023
Benchmarking Large Language Models for News Summarization
Tianyi Zhang
Faisal Ladhak
Esin Durmus
Percy Liang
Kathleen McKeown
Tatsunori B. Hashimoto
ELM
11
474
0
31 Jan 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
22
621
0
31 Jan 2023
Direct Preference-based Policy Optimization without Reward Modeling
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
OffRL
19
26
0
30 Jan 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Luke Munn
Liam Magee
Vanicka Arora
SyDa
HILM
13
28
0
28 Jan 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
20
174
0
26 Jan 2023
Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards
John J. Nay
ELM
AILaw
22
15
0
24 Jan 2023
Previous
1
2
3
...
34
35
36
Next