v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

13 / 463 papers shown

Can Machines Learn Morality? The Delphi Experiment

...

Yejin Choi

336

153

14 Oct 2021

Unsolved Problems in ML Safety

748

345

28 Sep 2021

Towards Understanding and Mitigating Social Biases in Language Models

Paul Pu Liang

Chiyu Wu

Louis-Philippe Morency

Ruslan Salakhutdinov

247

474

24 Jun 2021

Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning

Louis-Philippe Morency

SSL

265

05 Jun 2021

The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human IdentityAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

134

04 Jun 2021

Measuring Coding Challenge Competence With APPS

...

1.2K

910

20 May 2021

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt CollectionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Ruiqi Zhong

Kristy Lee

Zheng Zhang

Dan Klein

471

181

10 Apr 2021

Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to DoNature Machine Intelligence (Nat. Mach. Intell.), 2021

305

359

08 Mar 2021

Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer CommunitiesAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021

204

105

03 Feb 2021

Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their ConsequencesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Yejin Choi

288

150

31 Dec 2020

Social Chemistry 101: Learning to Reason about Social and Moral NormsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Yejin Choi

293

310

01 Nov 2020

Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020

2.3K

6,566

07 Sep 2020

Natural Adversarial ExamplesComputer Vision and Pattern Recognition (CVPR), 2019

930

1,746

16 Jul 2019