v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Hannah Rose Kirk

Bertie Vidgen

Paul Röttger

Scott A. Hale

311

124

09 Mar 2023

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

Jiale Cheng

226

18 Feb 2023

Commonsense Reasoning for Conversational AI: A Survey of the State of the Art

Christopher Richardson

Larry Heck

LRM

207

15 Feb 2023

Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

E. Davis

ELM LRM

314

09 Feb 2023

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic InformationAAAI Conference on Artificial Intelligence (AAAI), 2023

Ruyuan Wan

Jaehyung Kim

Luan Tuyen Chau

199

12 Jan 2023

A Multi-Level Framework for the AI Alignment Problem

Betty Hou

Brian Patrick Green

10 Jan 2023

Second Thoughts are Best: Learning to Re-Align With Human Values from Text EditsNeural Information Processing Systems (NeurIPS), 2023

Ruibo Liu

Ge Zhang

378

01 Jan 2023

Inclusive Artificial Intelligence

Dilip Arumugam

Shi Dong

Benjamin Van Roy

181

24 Dec 2022

MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral DiscussionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Qun Liu

240

21 Dec 2022

ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral SituationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Yejin Choi

326

20 Dec 2022

Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety

184

13 Dec 2022

Ensuring Visual Commonsense Morality for Text-to-Image Generation

Seong-Oak Park

Suhong Moon

Jinkyu Kim

191

07 Dec 2022

Speaking Multiple Languages Affects the Moral Bias of Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

263

14 Nov 2022

Zero-shot Visual Commonsense Immorality PredictionBritish Machine Vision Conference (BMVC), 2022

10 Nov 2022

Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

Yuling Gu

473

28 Oct 2022

TAPE: Assessing Few-shot Russian Language UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Alena Fenogenova

...

Valentina Kurenshchikova

Ekaterina Artemova

Vladislav Mikhailov

AAML

168

23 Oct 2022

Robots-Dont-Cry: Understanding Falsely Anthropomorphic Utterances in Dialog SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

David Gros

Yu Li

Zhou Yu

167

22 Oct 2022

Aligning MAGMA by Few-Shot Learning and Finetuning

18 Oct 2022

SafeText: A Benchmark for Exploring Physical Safety in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Kathleen McKeown

193

18 Oct 2022

How Would The Viewer Feel? Estimating Wellbeing From Video ScenariosNeural Information Processing Systems (NeurIPS), 2022

197

18 Oct 2022

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable SurveyConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Sachin Kumar

Vidhisha Balachandran

Lucille Njoo

Antonios Anastasopoulos

Yulia Tsvetkov

ELM

452

106

14 Oct 2022

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Andrea Madotto

178

14 Oct 2022

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral JudgmentNeural Information Processing Systems (NeurIPS), 2022

431

117

04 Oct 2022

Improving alignment of dialogue agents via targeted human judgements

...

538

637

28 Sep 2022

Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political IdentityAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Gabriel Simmons

371

24 Sep 2022

Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with HumansSocial Science Research Network (SSRN), 2022

John J. Nay

ELM AILaw

959

14 Sep 2022

The Alignment Problem from a Deep Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2022

Richard Ngo

Lawrence Chan

Sören Mindermann

542

250

30 Aug 2022

Atomist or Holist? A Diagnosis and Vision for More Productive Interdisciplinary AI Ethics DialoguePatterns (Patterns), 2022

Travis Greene

Amit Dhurandhar

Galit Shmueli

270

19 Aug 2022

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language ModelsNeural Information Processing Systems (NeurIPS), 2022

...

Iason Gabriel

275

16 Jun 2022

X-Risk Analysis for AI Research

Dan Hendrycks

Mantas Mazeika

525

13 Jun 2022

Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy

Kathleen C. Fraser

S. Kiritchenko

Esma Balkir

282

25 May 2022

ProsocialDialog: A Prosocial Backbone for Conversational AgentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Daniel Khashabi

Yejin Choi

231

145

25 May 2022

Towards Answering Open-ended Ethical Quandary Questions

...

Andrea Madotto

220

12 May 2022

Aligning to Social Norms and Values in Interactive NarrativesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Prithviraj Ammanabrolu

Yejin Choi

263

04 May 2022

A Corpus for Understanding and Generating Moral StoriesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Jian Guan

Ziqi Liu

Shiyu Huang

204

20 Apr 2022

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic EnvironmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

282

19 Apr 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

...

962

3,520

12 Apr 2022

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue SystemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Diyi Yang

270

115

06 Apr 2022

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

373

165

25 Mar 2022

Do Multilingual Language Models Capture Differing Moral Norms?

170

18 Mar 2022

Speciesist bias in AI -- How AI applications perpetuate discrimination and unfair outcomes against animalsAI and Ethics (AE), 2022

219

22 Feb 2022

Few-shot Learning with Multilingual Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

...

Luke Zettlemoyer

Xian Li

362

356

20 Dec 2021

DREAM: Improving Situational QA by First Elaborating the Situation

Yuling Gu

Bhavana Dalvi

Peter Clark

266

16 Dec 2021

ValueNet: A New Dataset for Human Value Driven Dialogue System

248

12 Dec 2021

Analysis and Prediction of NLP Models Via Task Embeddings

Damien Sileo

Marie-Francine Moens

126

10 Dec 2021

A General Language Assistant as a Laboratory for Alignment

Deep Ganguli

...

482

978

01 Dec 2021

On Transferability of Prompt Tuning for Natural Language ProcessingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Yusheng Su

Xiaozhi Wang

Yujia Qin

Chi-Min Chan

Yankai Lin

...

Peng Li

Juanzi Li

Lei Hou

Maosong Sun

Jie Zhou

AAML VLM

246

114

12 Nov 2021

A Word on Machine Ethics: A Response to Jiang et al. (2021)

272

07 Nov 2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

246

25 Oct 2021

The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail

Sam Bowman

OffRL

373

15 Oct 2021