v1v2v3 (latest)

A General Language Assistant as a Laboratory for Alignment

1 December 2021

Deep Ganguli

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "A General Language Assistant as a Laboratory for Alignment"

50 / 701 papers shown

The Wisdom of Hindsight Makes Language Models Better Instruction FollowersInternational Conference on Machine Learning (ICML), 2023

Tianjun Zhang

Fangchen Liu

Justin Wong

Pieter Abbeel

Joseph E. Gonzalez

216

10 Feb 2023

Chain of Hindsight Aligns Language Models with FeedbackInternational Conference on Learning Representations (ICLR), 2023

Hao Liu

Carmelo Sferrazza

Pieter Abbeel

ALM

802

149

06 Feb 2023

Using In-Context Learning to Improve Dialogue SafetyConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Siva Reddy

Yang Liu

Dilek Z. Hakkani-Tür

250

02 Feb 2023

Co-Writing with Opinionated Language Models Affects Users' ViewsInternational Conference on Human Factors in Computing Systems (CHI), 2023

327

281

01 Feb 2023

Truth Machines: Synthesizing Veracity in AI Language ModelsAi & Society (AI & Society), 2023

28 Jan 2023

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

294

770

18 Jan 2023

Second Thoughts are Best: Learning to Re-Align With Human Values from Text EditsNeural Information Processing Systems (NeurIPS), 2023

Ruibo Liu

Ge Zhang

374

01 Jan 2023

Inclusive Artificial Intelligence

Dilip Arumugam

Shi Dong

Benjamin Van Roy

179

24 Dec 2022

Discovering Language Model Behaviors with Model-Written EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Deep Ganguli

359

587

19 Dec 2022

Constitutional AI: Harmlessness from AI Feedback

...

890

2,326

15 Dec 2022

Editing Models with Task ArithmeticInternational Conference on Learning Representations (ICLR), 2022

1.3K

740

08 Dec 2022

Discovering Latent Knowledge in Language Models Without SupervisionInternational Conference on Learning Representations (ICLR), 2022

417

531

07 Dec 2022

Fine-tuning language models to find agreement among humans with diverse preferencesNeural Information Processing Systems (NeurIPS), 2022

Michiel A. Bakker

Martin Chadwick

Hannah R. Sheahan

Michael Henry Tessler

Lucy Campbell-Gillingham

...

Christopher Summerfield

ALM

289

28 Nov 2022

The Expertise Problem: Learning from Specialized Feedback

Oliver Daniels-Koch

Rachel Freedman

OffRL

151

12 Nov 2022

The CRINGE Loss: Learning what language not to modelAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Jason Weston

238

10 Nov 2022

ADEPT: A DEbiasing PrompT FrameworkAAAI Conference on Artificial Intelligence (AAAI), 2022

378

10 Nov 2022

Measuring Progress on Scalable Oversight for Large Language Models

...

321

172

04 Nov 2022

Large Language Models Are Human-Level Prompt EngineersInternational Conference on Learning Representations (ICLR), 2022

Silviu Pitis

Jimmy Ba

503

1,173

03 Nov 2022

Fine-Tuning Language Models via Epistemic Neural Networks

292

03 Nov 2022

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good LabelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Jason Weston

200

28 Oct 2022

Broken Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2022

1.1K

26 Oct 2022

Continued Pretraining for Better Zero- and Few-Shot PromptabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Zhaofeng Wu

IV RobertL.Logan

Pete Walsh

Akshita Bhagia

Dirk Groeneveld

Sameer Singh

Iz Beltagy

VLM

228

19 Oct 2022

Mitigating Covertly Unsafe Text within Natural Language SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Kathleen McKeown

317

17 Oct 2022

LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language ModelsACM Transactions on Software Engineering and Methodology (TOSEM), 2022

249

07 Oct 2022

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral JudgmentNeural Information Processing Systems (NeurIPS), 2022

424

117

04 Oct 2022

Learning by Distilling Context

Charles Burton Snell

Dan Klein

Ruiqi Zhong

ReLM LRM

600

30 Sep 2022

Improving alignment of dialogue agents via targeted human judgements

...

535

631

28 Sep 2022

Evaluation of Question Answering Systems: Complexity of judging a natural languageACM Computing Surveys (ACM CSUR), 2022

247

10 Sep 2022

In conversation with Artificial Intelligence: aligning language models with human valuesPhilosophy & Technology (PT), 2022

Atoosa Kasirzadeh

Iason Gabriel

348

135

01 Sep 2022

Towards Boosting the Open-Domain Chatbot with Human FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Fan Wang

167

30 Aug 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli

...

603

633

23 Aug 2022

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Jason Weston

208

05 Aug 2022

A Hazard Analysis Framework for Code Synthesis Large Language Models

112

25 Jul 2022

Language Models (Mostly) Know What They Know

...

638

1,139

11 Jul 2022

Machine Learning Model Sizes and the Parameter Gap

189

05 Jul 2022

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

Jason Weston

253

15 Jun 2022

Emergent Abilities of Large Language Models

...

Tatsunori Hashimoto

532

3,141

15 Jun 2022

Researching Alignment Research: Unsupervised Analysis

121

06 Jun 2022

Teaching Models to Express Their Uncertainty in Words

509

551

28 May 2022

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQLConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Ruiqi Zhong

Charles Burton Snell

Dan Klein

Jason Eisner

362

25 May 2022

Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining DatasetsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Benjamin Schiller

Johannes Daxenberger

Iryna Gurevych

268

23 May 2022

RL with KL penalties is better viewed as Bayesian inferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Tomasz Korbak

Ethan Perez

Christopher L. Buckley

OffRL

304

101

23 May 2022

Scaling Laws and Interpretability of Learning from Repeated Data

...

289

145

21 May 2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

...

371

952

14 Apr 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

...

918

3,476

12 Apr 2022

Teaching language models to support answers with verified quotes

...

Lucy Campbell-Gillingham

G. Irving

Nat McAleese

ELM RALM

519

303

21 Mar 2022

Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022

Carroll L. Wainwright

...

2.1K

17,638

04 Mar 2022

Red Teaming Language Models with Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Saffron Huang

448

865

07 Feb 2022

Datasheet for the Pile

Stella Biderman

Kieran Bicheno

Leo Gao

228

13 Jan 2022

The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail

Sam Bowman

OffRL

366

15 Oct 2021