Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.02275
Cited By
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
J. Li
D. Song
Jacob Steinhardt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
47 / 347 papers shown
Title
Robots-Dont-Cry: Understanding Falsely Anthropomorphic Utterances in Dialog Systems
David Gros
Yu Li
Zhou Yu
41
9
0
22 Oct 2022
Aligning MAGMA by Few-Shot Learning and Finetuning
Jean-Charles Layoun
Alexis Roger
Irina Rish
VLM
14
2
0
18 Oct 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
46
40
0
18 Oct 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika
Eric Tang
Andy Zou
Steven Basart
Jun Shern Chan
Dawn Song
David A. Forsyth
Jacob Steinhardt
Dan Hendrycks
26
8
0
18 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
66
85
0
14 Oct 2022
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Yejin Bang
Tiezheng Yu
Andrea Madotto
Zhaojiang Lin
Mona T. Diab
Pascale Fung
19
13
0
14 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Rada Mihalcea
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
23
90
0
04 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
98
57
0
24 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
84
27
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
52
181
0
30 Aug 2022
Atomist or Holist? A Diagnosis and Vision for More Productive Interdisciplinary AI Ethics Dialogue
Travis Greene
Amit Dhurandhar
Galit Shmueli
10
7
0
19 Aug 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
25
49
0
16 Jun 2022
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
22
67
0
13 Jun 2022
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
Kathleen C. Fraser
S. Kiritchenko
Esma Balkir
107
37
0
25 May 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Hyunwoo J. Kim
Youngjae Yu
Liwei Jiang
Ximing Lu
Daniel Khashabi
Gunhee Kim
Yejin Choi
Maarten Sap
10
116
0
25 May 2022
Towards Answering Open-ended Ethical Quandary Questions
Yejin Bang
Nayeon Lee
Tiezheng Yu
Leila Khalatbari
Yan Xu
...
Romain Barraud
Elham J. Barezi
Andrea Madotto
Hayden Kee
Pascale Fung
ELM
28
6
0
12 May 2022
Aligning to Social Norms and Values in Interactive Narratives
Prithviraj Ammanabrolu
Liwei Jiang
Maarten Sap
Hannaneh Hajishirzi
Yejin Choi
AI4CE
25
46
0
04 May 2022
A Corpus for Understanding and Generating Moral Stories
Jian-Yu Guan
Ziqi Liu
Minlie Huang
24
9
0
20 Apr 2022
What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment
Matthew Finlayson
Kyle Richardson
Ashish Sabharwal
Peter Clark
22
12
0
19 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
52
2,299
0
12 Apr 2022
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Caleb Ziems
Jane A. Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
15
92
0
06 Apr 2022
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values
Arnav Arora
Lucie-Aimée Kaffee
Isabelle Augenstein
VLM
23
122
0
25 Mar 2022
Do Multilingual Language Models Capture Differing Moral Norms?
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Alexander M. Fraser
Kristian Kersting
11
15
0
18 Mar 2022
Speciesist bias in AI -- How AI applications perpetuate discrimination and unfair outcomes against animals
Thilo Hagendorff
L. Bossert
Yip Fai Tse
P. Singer
FaML
15
40
0
22 Feb 2022
Few-shot Learning with Multilingual Language Models
Xi Victoria Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
61
284
0
20 Dec 2021
DREAM: Improving Situational QA by First Elaborating the Situation
Yuling Gu
Bhavana Dalvi
Peter Clark
19
18
0
16 Dec 2021
ValueNet: A New Dataset for Human Value Driven Dialogue System
Liang Qiu
Yizhou Zhao
Jinchao Li
Pan Lu
Baolin Peng
Jianfeng Gao
Song-Chun Zhu
22
35
0
12 Dec 2021
Analysis and Prediction of NLP Models Via Task Embeddings
Damien Sileo
Marie-Francine Moens
22
3
0
10 Dec 2021
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
6
714
0
01 Dec 2021
On Transferability of Prompt Tuning for Natural Language Processing
Yusheng Su
Xiaozhi Wang
Yujia Qin
Chi-Min Chan
Yankai Lin
...
Peng Li
Juanzi Li
Lei Hou
Maosong Sun
Jie Zhou
AAML
VLM
18
98
0
12 Nov 2021
A Word on Machine Ethics: A Response to Jiang et al. (2021)
Zeerak Talat
Hagen Blix
Josef Valvoda
M. I. Ganesh
Ryan Cotterell
Adina Williams
SyDa
FaML
88
39
0
07 Nov 2021
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks
Mantas Mazeika
Andy Zou
Sahil Patel
Christine Zhu
Jesus Navarro
D. Song
Bo-wen Li
Jacob Steinhardt
8
58
0
25 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
20
45
0
15 Oct 2021
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
117
110
0
14 Oct 2021
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
173
273
0
28 Sep 2021
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
25
378
0
24 Jun 2021
Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning
Martin Q. Ma
Yao-Hung Hubert Tsai
Paul Pu Liang
Han Zhao
Kun Zhang
Ruslan Salakhutdinov
Louis-Philippe Morency
SSL
21
15
0
05 Jun 2021
The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human Identity
David Gros
Yu Li
Zhou Yu
DeLMO
18
18
0
04 Jun 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
623
0
20 May 2021
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Ruiqi Zhong
Kristy Lee
Zheng-Wei Zhang
Dan Klein
20
166
0
10 Apr 2021
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do
P. Schramowski
Cigdem Turan
Nico Andersen
Constantin Rothkopf
Kristian Kersting
25
281
0
08 Mar 2021
Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities
Nenad Tomašev
Kevin R. McKee
Jackie Kay
Shakir Mohamed
FaML
11
86
0
03 Feb 2021
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences
Denis Emelin
Ronan Le Bras
Jena D. Hwang
Maxwell Forbes
Yejin Choi
LRM
13
125
0
31 Dec 2020
Social Chemistry 101: Learning to Reason about Social and Moral Norms
Maxwell Forbes
Jena D. Hwang
Vered Shwartz
Maarten Sap
Yejin Choi
6
259
0
01 Nov 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
8
3,835
0
07 Sep 2020
Natural Adversarial Examples
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
D. Song
OODD
21
1,416
0
16 Jul 2019
Previous
1
2
3
4
5
6
7