Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
311
124
0
09 Mar 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Shiyu Huang
LM&MA
ELM
226
22
0
18 Feb 2023
Commonsense Reasoning for Conversational AI: A Survey of the State of the Art
Christopher Richardson
Larry Heck
LRM
207
10
0
15 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
E. Davis
ELM
LRM
314
80
0
09 Feb 2023
Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ruyuan Wan
Jaehyung Kim
Luan Tuyen Chau
199
55
0
12 Jan 2023
A Multi-Level Framework for the AI Alignment Problem
Betty Hou
Brian Patrick Green
89
9
0
10 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Neural Information Processing Systems (NeurIPS), 2023
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
378
40
0
01 Jan 2023
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
181
3
0
24 Dec 2022
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hao Sun
Zhexin Zhang
Fei Mi
Yasheng Wang
Wen Liu
Jianwei Cui
Bin Wang
Qun Liu
Shiyu Huang
240
28
0
21 Dec 2022
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Valentina Pyatkin
Jena D. Hwang
Vivek Srikumar
Ximing Lu
Liwei Jiang
Yejin Choi
Chandra Bhagavatula
326
44
0
20 Dec 2022
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
Joshua Albrecht
Ellie Kitanidis
Abraham J. Fetterman
ELM
ReLM
ALM
LRM
184
23
0
13 Dec 2022
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Seong-Oak Park
Suhong Moon
Jinkyu Kim
191
2
0
07 Dec 2022
Speaking Multiple Languages Affects the Moral Bias of Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Constantin Rothkopf
Kangyang Luo
Kristian Kersting
263
43
0
14 Nov 2022
Zero-shot Visual Commonsense Immorality Prediction
British Machine Vision Conference (BMVC), 2022
Yujin Jeong
Seongbeom Park
Suhong Moon
Jinkyu Kim
VLM
90
3
0
10 Nov 2022
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE
Yuling Gu
Yao Fu
Valentina Pyatkin
Ian H. Magnusson
Bhavana Dalvi
Peter Clark
473
11
0
28 Oct 2022
TAPE: Assessing Few-shot Russian Language Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ekaterina Taktasheva
Tatiana Shavrina
Alena Fenogenova
Denis Shevelev
Nadezhda Katricheva
...
Svetlana Iordanskaia
Alena Spiridonova
Valentina Kurenshchikova
Ekaterina Artemova
Vladislav Mikhailov
AAML
168
14
0
23 Oct 2022
Robots-Dont-Cry: Understanding Falsely Anthropomorphic Utterances in Dialog Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
David Gros
Yu Li
Zhou Yu
167
13
0
22 Oct 2022
Aligning MAGMA by Few-Shot Learning and Finetuning
Jean-Charles Layoun
Alexis Roger
Irina Rish
VLM
80
2
0
18 Oct 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
193
48
0
18 Oct 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Neural Information Processing Systems (NeurIPS), 2022
Mantas Mazeika
Eric Tang
Andy Zou
Steven Basart
Jun Shern Chan
Dawn Song
David A. Forsyth
Jacob Steinhardt
Dan Hendrycks
197
11
0
18 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
452
106
0
14 Oct 2022
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Yejin Bang
Tiezheng Yu
Andrea Madotto
Mohammad Kachuee
Mona T. Diab
Pascale Fung
178
14
0
14 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Neural Information Processing Systems (NeurIPS), 2022
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Amélie Reymond
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
431
117
0
04 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
538
637
0
28 Sep 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Gabriel Simmons
371
87
0
24 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
Social Science Research Network (SSRN), 2022
John J. Nay
ELM
AILaw
959
34
0
14 Sep 2022
The Alignment Problem from a Deep Learning Perspective
International Conference on Learning Representations (ICLR), 2022
Richard Ngo
Lawrence Chan
Sören Mindermann
542
250
0
30 Aug 2022
Atomist or Holist? A Diagnosis and Vision for More Productive Interdisciplinary AI Ethics Dialogue
Patterns (Patterns), 2022
Travis Greene
Amit Dhurandhar
Galit Shmueli
270
9
0
19 Aug 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Neural Information Processing Systems (NeurIPS), 2022
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
275
61
0
16 Jun 2022
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
525
81
0
13 Jun 2022
Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
Kathleen C. Fraser
S. Kiritchenko
Esma Balkir
282
41
0
25 May 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hyunwoo J. Kim
Youngjae Yu
Liwei Jiang
Ximing Lu
Daniel Khashabi
Gunhee Kim
Yejin Choi
Maarten Sap
231
145
0
25 May 2022
Towards Answering Open-ended Ethical Quandary Questions
Yejin Bang
Nayeon Lee
Tiezheng Yu
Leila Khalatbari
Yan Xu
...
Romain Barraud
Elham J. Barezi
Andrea Madotto
Hayden Kee
Pascale Fung
ELM
220
6
0
12 May 2022
Aligning to Social Norms and Values in Interactive Narratives
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Prithviraj Ammanabrolu
Liwei Jiang
Maarten Sap
Hannaneh Hajishirzi
Yejin Choi
AI4CE
263
52
0
04 May 2022
A Corpus for Understanding and Generating Moral Stories
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Jian Guan
Ziqi Liu
Shiyu Huang
204
19
0
20 Apr 2022
What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Matthew Finlayson
Kyle Richardson
Ashish Sabharwal
Peter Clark
282
13
0
19 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
962
3,520
0
12 Apr 2022
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Caleb Ziems
Jane A. Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
270
115
0
06 Apr 2022
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values
Arnav Arora
Lucie-Aimée Kaffee
Isabelle Augenstein
VLM
373
165
0
25 Mar 2022
Do Multilingual Language Models Capture Differing Moral Norms?
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Kangyang Luo
Kristian Kersting
170
17
0
18 Mar 2022
Speciesist bias in AI -- How AI applications perpetuate discrimination and unfair outcomes against animals
AI and Ethics (AE), 2022
Thilo Hagendorff
L. Bossert
Yip Fai Tse
P. Singer
FaML
219
58
0
22 Feb 2022
Few-shot Learning with Multilingual Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
362
356
0
20 Dec 2021
DREAM: Improving Situational QA by First Elaborating the Situation
Yuling Gu
Bhavana Dalvi
Peter Clark
266
19
0
16 Dec 2021
ValueNet: A New Dataset for Human Value Driven Dialogue System
Liang Qiu
Yizhou Zhao
Jinchao Li
Pan Lu
Baolin Peng
Jianfeng Gao
Song-Chun Zhu
248
44
0
12 Dec 2021
Analysis and Prediction of NLP Models Via Task Embeddings
Damien Sileo
Marie-Francine Moens
126
6
0
10 Dec 2021
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
482
978
0
01 Dec 2021
On Transferability of Prompt Tuning for Natural Language Processing
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Yusheng Su
Xiaozhi Wang
Yujia Qin
Chi-Min Chan
Yankai Lin
...
Peng Li
Juanzi Li
Lei Hou
Maosong Sun
Jie Zhou
AAML
VLM
246
114
0
12 Nov 2021
A Word on Machine Ethics: A Response to Jiang et al. (2021)
Zeerak Talat
Hagen Blix
Josef Valvoda
M. I. Ganesh
Robert Bamler
Adina Williams
SyDa
FaML
272
39
0
07 Nov 2021
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks
Mantas Mazeika
Andy Zou
Sahil Patel
Christine Zhu
Jesus Navarro
Basel Alomair
Yue Liu
Jacob Steinhardt
246
72
0
25 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
373
48
0
15 Oct 2021
Previous
1
2
3
...
10
8
9
Next
Page 9 of 10