ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00861
  4. Cited By
A General Language Assistant as a Laboratory for Alignment
v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
    ALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 701 papers shown
Title
The Wisdom of Hindsight Makes Language Models Better Instruction
  Followers
The Wisdom of Hindsight Makes Language Models Better Instruction FollowersInternational Conference on Machine Learning (ICML), 2023
Tianjun Zhang
Fangchen Liu
Justin Wong
Pieter Abbeel
Joseph E. Gonzalez
192
58
0
10 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
Chain of Hindsight Aligns Language Models with FeedbackInternational Conference on Learning Representations (ICLR), 2023
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
729
146
0
06 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Using In-Context Learning to Improve Dialogue SafetyConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
219
48
0
02 Feb 2023
Co-Writing with Opinionated Language Models Affects Users' Views
Co-Writing with Opinionated Language Models Affects Users' ViewsInternational Conference on Human Factors in Computing Systems (CHI), 2023
Maurice Jakesch
Advait Bhat
Daniel Buschek
Lior Zalmanson
Mor Naaman
ELM
293
277
0
01 Feb 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Truth Machines: Synthesizing Veracity in AI Language ModelsAi & Society (AI & Society), 2023
Luke Munn
Liam Magee
Vanicka Arora
SyDaHILM
85
50
0
28 Jan 2023
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation,
  and Detection
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo
Xin Zhang
Ziyuan Wang
Minqi Jiang
Jinran Nie
Yuxuan Ding
Jianwei Yue
Yupeng Wu
DeLMOELM
259
759
0
18 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text EditsNeural Information Processing Systems (NeurIPS), 2023
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
343
40
0
01 Jan 2023
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
159
3
0
24 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
351
571
0
19 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDaMoMe
864
2,244
0
15 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task ArithmeticInternational Conference on Learning Representations (ICLR), 2022
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELMMoMeMU
1.2K
730
0
08 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without SupervisionInternational Conference on Learning Representations (ICLR), 2022
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
400
520
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferencesNeural Information Processing Systems (NeurIPS), 2022
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
260
284
0
28 Nov 2022
The Expertise Problem: Learning from Specialized Feedback
The Expertise Problem: Learning from Specialized Feedback
Oliver Daniels-Koch
Rachel Freedman
OffRL
137
21
0
12 Nov 2022
The CRINGE Loss: Learning what language not to model
The CRINGE Loss: Learning what language not to modelAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
209
40
0
10 Nov 2022
ADEPT: A DEbiasing PrompT Framework
ADEPT: A DEbiasing PrompT FrameworkAAAI Conference on Artificial Intelligence (AAAI), 2022
Ke Yang
Charles Yu
Yi R. Fung
Pengfei Yu
Heng Ji
340
33
0
10 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALMELM
282
168
0
04 Nov 2022
Large Language Models Are Human-Level Prompt Engineers
Large Language Models Are Human-Level Prompt EngineersInternational Conference on Learning Representations (ICLR), 2022
Yongchao Zhou
Andrei Ioan Muresanu
Ziwen Han
Keiran Paster
Silviu Pitis
Harris Chan
Jimmy Ba
ALMLLMAG
429
1,155
0
03 Nov 2022
Fine-Tuning Language Models via Epistemic Neural Networks
Fine-Tuning Language Models via Epistemic Neural Networks
Ian Osband
S. Asghari
Benjamin Van Roy
Nat McAleese
John Aslanides
G. Irving
UQLM
274
23
0
03 Nov 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad
  Responses into Good Labels
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good LabelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Weiyan Shi
Emily Dinan
Kurt Shuster
Jason Weston
Jing Xu
180
22
0
28 Oct 2022
Broken Neural Scaling Laws
Broken Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2022
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
984
98
0
26 Oct 2022
Continued Pretraining for Better Zero- and Few-Shot Promptability
Continued Pretraining for Better Zero- and Few-Shot PromptabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhaofeng Wu
IV RobertL.Logan
Pete Walsh
Akshita Bhagia
Dirk Groeneveld
Sameer Singh
Iz Beltagy
VLM
210
15
0
19 Oct 2022
Mitigating Covertly Unsafe Text within Natural Language Systems
Mitigating Covertly Unsafe Text within Natural Language SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alex Mei
Anisha Kabir
Sharon Levy
Melanie Subbiah
Emily Allaway
J. Judge
D. Patton
Bruce Bimber
Kathleen McKeown
William Yang Wang
293
13
0
17 Oct 2022
LLMEffiChecker: Understanding and Testing Efficiency Degradation of
  Large Language Models
LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language ModelsACM Transactions on Software Engineering and Methodology (TOSEM), 2022
Simin Chen
Cong Liu
Mirazul Haque
Wei Yang
211
31
0
07 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human
  Moral Judgment
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral JudgmentNeural Information Processing Systems (NeurIPS), 2022
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Amélie Reymond
J. Tenenbaum
Bernhard Schölkopf
ELMLRM
361
116
0
04 Oct 2022
Learning by Distilling Context
Learning by Distilling Context
Charles Burton Snell
Dan Klein
Ruiqi Zhong
ReLMLRM
549
58
0
30 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALMAAML
517
628
0
28 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a
  natural language
Evaluation of Question Answering Systems: Complexity of judging a natural languageACM Computing Surveys (ACM CSUR), 2022
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
222
10
0
10 Sep 2022
In conversation with Artificial Intelligence: aligning language models
  with human values
In conversation with Artificial Intelligence: aligning language models with human valuesPhilosophy & Technology (PT), 2022
Atoosa Kasirzadeh
Iason Gabriel
346
130
0
01 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
151
20
0
30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
571
620
0
23 Aug 2022
Learning New Skills after Deployment: Improving open-domain
  internet-driven dialogue with human feedback
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
181
43
0
05 Aug 2022
A Hazard Analysis Framework for Code Synthesis Large Language Models
A Hazard Analysis Framework for Code Synthesis Large Language Models
Heidy Khlaaf
Pamela Mishkin
Joshua Achiam
Gretchen Krueger
Miles Brundage
ELM
103
35
0
25 Jul 2022
Language Models (Mostly) Know What They Know
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
604
1,116
0
11 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALMELMAI4CE
172
76
0
05 Jul 2022
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
Kushal Arora
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
VLM
243
44
0
15 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Abigail Z. Jacobs
J. Dean
W. Fedus
ELMReLMLRM
500
3,098
0
15 Jun 2022
Researching Alignment Research: Unsupervised Analysis
Researching Alignment Research: Unsupervised Analysis
Jan H. Kirchner
Logan Smith
Jacques Thibodeau
Kyle McDonell
Laria Reynolds
113
10
0
06 Jun 2022
Teaching Models to Express Their Uncertainty in Words
Teaching Models to Express Their Uncertainty in Words
Stephanie C. Lin
Jacob Hilton
Owain Evans
OOD
471
537
0
28 May 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A
  Case Study with Text-to-SQL
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQLConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
317
10
0
25 May 2022
Diversity Over Size: On the Effect of Sample and Topic Sizes for
  Argument Mining Datasets
Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining DatasetsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Benjamin Schiller
Johannes Daxenberger
Iryna Gurevych
232
6
0
23 May 2022
RL with KL penalties is better viewed as Bayesian inference
RL with KL penalties is better viewed as Bayesian inferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tomasz Korbak
Ethan Perez
Christopher L. Buckley
OffRL
274
99
0
23 May 2022
Scaling Laws and Interpretability of Learning from Repeated Data
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
...
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
280
144
0
21 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
338
947
0
14 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
861
3,437
0
12 Apr 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELMRALM
498
302
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.0K
17,254
0
04 Mar 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
436
849
0
07 Feb 2022
Datasheet for the Pile
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
220
40
0
13 Jan 2022
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP
  Systems Fail
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
346
48
0
15 Oct 2021
Previous
123...131415
Next