Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.00861
Cited By
v1
v2
v3 (latest)
A General Language Assistant as a Laboratory for Alignment
1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"A General Language Assistant as a Laboratory for Alignment"
50 / 701 papers shown
Title
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
International Conference on Machine Learning (ICML), 2023
Tianjun Zhang
Fangchen Liu
Justin Wong
Pieter Abbeel
Joseph E. Gonzalez
192
58
0
10 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
International Conference on Learning Representations (ICLR), 2023
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
729
146
0
06 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
219
48
0
02 Feb 2023
Co-Writing with Opinionated Language Models Affects Users' Views
International Conference on Human Factors in Computing Systems (CHI), 2023
Maurice Jakesch
Advait Bhat
Daniel Buschek
Lior Zalmanson
Mor Naaman
ELM
293
277
0
01 Feb 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Ai & Society (AI & Society), 2023
Luke Munn
Liam Magee
Vanicka Arora
SyDa
HILM
85
50
0
28 Jan 2023
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo
Xin Zhang
Ziyuan Wang
Minqi Jiang
Jinran Nie
Yuxuan Ding
Jianwei Yue
Yupeng Wu
DeLMO
ELM
259
759
0
18 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Neural Information Processing Systems (NeurIPS), 2023
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
343
40
0
01 Jan 2023
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
159
3
0
24 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
351
571
0
19 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
864
2,244
0
15 Dec 2022
Editing Models with Task Arithmetic
International Conference on Learning Representations (ICLR), 2022
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
1.2K
730
0
08 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
International Conference on Learning Representations (ICLR), 2022
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
400
520
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse preferences
Neural Information Processing Systems (NeurIPS), 2022
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
260
284
0
28 Nov 2022
The Expertise Problem: Learning from Specialized Feedback
Oliver Daniels-Koch
Rachel Freedman
OffRL
137
21
0
12 Nov 2022
The CRINGE Loss: Learning what language not to model
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
209
40
0
10 Nov 2022
ADEPT: A DEbiasing PrompT Framework
AAAI Conference on Artificial Intelligence (AAAI), 2022
Ke Yang
Charles Yu
Yi R. Fung
Pengfei Yu
Heng Ji
340
33
0
10 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
282
168
0
04 Nov 2022
Large Language Models Are Human-Level Prompt Engineers
International Conference on Learning Representations (ICLR), 2022
Yongchao Zhou
Andrei Ioan Muresanu
Ziwen Han
Keiran Paster
Silviu Pitis
Harris Chan
Jimmy Ba
ALM
LLMAG
429
1,155
0
03 Nov 2022
Fine-Tuning Language Models via Epistemic Neural Networks
Ian Osband
S. Asghari
Benjamin Van Roy
Nat McAleese
John Aslanides
G. Irving
UQLM
274
23
0
03 Nov 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Weiyan Shi
Emily Dinan
Kurt Shuster
Jason Weston
Jing Xu
180
22
0
28 Oct 2022
Broken Neural Scaling Laws
International Conference on Learning Representations (ICLR), 2022
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
984
98
0
26 Oct 2022
Continued Pretraining for Better Zero- and Few-Shot Promptability
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhaofeng Wu
IV RobertL.Logan
Pete Walsh
Akshita Bhagia
Dirk Groeneveld
Sameer Singh
Iz Beltagy
VLM
210
15
0
19 Oct 2022
Mitigating Covertly Unsafe Text within Natural Language Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alex Mei
Anisha Kabir
Sharon Levy
Melanie Subbiah
Emily Allaway
J. Judge
D. Patton
Bruce Bimber
Kathleen McKeown
William Yang Wang
293
13
0
17 Oct 2022
LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models
ACM Transactions on Software Engineering and Methodology (TOSEM), 2022
Simin Chen
Cong Liu
Mirazul Haque
Wei Yang
211
31
0
07 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Neural Information Processing Systems (NeurIPS), 2022
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Amélie Reymond
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
361
116
0
04 Oct 2022
Learning by Distilling Context
Charles Burton Snell
Dan Klein
Ruiqi Zhong
ReLM
LRM
549
58
0
30 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
517
628
0
28 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a natural language
ACM Computing Surveys (ACM CSUR), 2022
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
222
10
0
10 Sep 2022
In conversation with Artificial Intelligence: aligning language models with human values
Philosophy & Technology (PT), 2022
Atoosa Kasirzadeh
Iason Gabriel
346
130
0
01 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
151
20
0
30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
571
620
0
23 Aug 2022
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
181
43
0
05 Aug 2022
A Hazard Analysis Framework for Code Synthesis Large Language Models
Heidy Khlaaf
Pamela Mishkin
Joshua Achiam
Gretchen Krueger
Miles Brundage
ELM
103
35
0
25 Jul 2022
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
604
1,116
0
11 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
172
76
0
05 Jul 2022
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
Kushal Arora
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
VLM
243
44
0
15 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Abigail Z. Jacobs
J. Dean
W. Fedus
ELM
ReLM
LRM
500
3,098
0
15 Jun 2022
Researching Alignment Research: Unsupervised Analysis
Jan H. Kirchner
Logan Smith
Jacques Thibodeau
Kyle McDonell
Laria Reynolds
113
10
0
06 Jun 2022
Teaching Models to Express Their Uncertainty in Words
Stephanie C. Lin
Jacob Hilton
Owain Evans
OOD
471
537
0
28 May 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
317
10
0
25 May 2022
Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Benjamin Schiller
Johannes Daxenberger
Iryna Gurevych
232
6
0
23 May 2022
RL with KL penalties is better viewed as Bayesian inference
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tomasz Korbak
Ethan Perez
Christopher L. Buckley
OffRL
274
99
0
23 May 2022
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
...
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
280
144
0
21 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
338
947
0
14 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
861
3,437
0
12 Apr 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
498
302
0
21 Mar 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
2.0K
17,254
0
04 Mar 2022
Red Teaming Language Models with Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
436
849
0
07 Feb 2022
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
220
40
0
13 Jan 2022
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
346
48
0
15 Oct 2021
Previous
1
2
3
...
13
14
15
Next