Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
13 / 463 papers shown
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
336
153
0
14 Oct 2021
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
748
345
0
28 Sep 2021
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
247
474
0
24 Jun 2021
Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning
Martin Q. Ma
Yifan Hao
Paul Pu Liang
Han Zhao
Kun Zhang
Ruslan Salakhutdinov
Louis-Philippe Morency
SSL
265
19
0
05 Jun 2021
The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human Identity
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
David Gros
Yu Li
Zhou Yu
DeLMO
134
23
0
04 Jun 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELM
AIMat
ALM
1.2K
910
0
20 May 2021
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ruiqi Zhong
Kristy Lee
Zheng Zhang
Dan Klein
471
181
0
10 Apr 2021
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do
Nature Machine Intelligence (Nat. Mach. Intell.), 2021
P. Schramowski
Cigdem Turan
Nico Andersen
Constantin Rothkopf
Kristian Kersting
305
359
0
08 Mar 2021
Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021
Nenad Tomašev
Kevin R. McKee
Jackie Kay
Shakir Mohamed
FaML
204
105
0
03 Feb 2021
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Denis Emelin
Ronan Le Bras
Jena D. Hwang
Maxwell Forbes
Yejin Choi
LRM
288
150
0
31 Dec 2020
Social Chemistry 101: Learning to Reason about Social and Moral Norms
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Maxwell Forbes
Jena D. Hwang
Vered Shwartz
Maarten Sap
Yejin Choi
293
310
0
01 Nov 2020
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
2.3K
6,566
0
07 Sep 2020
Natural Adversarial Examples
Computer Vision and Pattern Recognition (CVPR), 2019
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
Basel Alomair
OODD
930
1,746
0
16 Jul 2019
Previous
1
2
3
...
10
8
9