Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.02275
Cited By
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
J. Li
D. Song
Jacob Steinhardt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 347 papers shown
Title
MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models
Mohd Zaki
J. Jayadeva
Mausam
N. M. A. Krishnan
ELM
6
4
0
17 Aug 2023
FLIRT: Feedback Loop In-context Red Teaming
Ninareh Mehrabi
Palash Goyal
Christophe Dupuy
Qian Hu
Shalini Ghosh
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
DiffM
21
55
0
08 Aug 2023
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
Y. Kang
Jihan Kim
AI4CE
LLMAG
30
12
0
01 Aug 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
31
3
0
31 Jul 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
69
1,239
0
27 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
25
115
0
26 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
24
47
0
17 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
58
1,506
0
06 Jul 2023
Minimum Levels of Interpretability for Artificial Moral Agents
Avish Vijayaraghavan
C. Badea
AI4CE
25
5
0
02 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
33
205
0
28 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELM
ReLM
LRM
22
12
0
25 Jun 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
11
0
0
22 Jun 2023
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory
Masashi Takeshita
Rafal Rzepka
K. Araki
13
8
0
20 Jun 2023
Toward Grounded Commonsense Reasoning
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&Ro
ReLM
LRM
36
9
0
14 Jun 2023
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
13
2
0
05 Jun 2023
Knowledge of cultural moral norms in large language models
Aida Ramezani
Yang Xu
ELM
AILaw
24
46
0
02 Jun 2023
Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?
Bonan Kou
Shengmai Chen
Zhijie Wang
Lei Ma
Tianyi Zhang
ALM
11
13
0
02 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
41
178
0
29 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
30
28
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice H. Oh
San-hee Park
Jung-Woo Ha
36
16
0
28 May 2023
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh V. Chawla
Olaf Wiest
Xiangliang Zhang
ELM
44
126
0
27 May 2023
NormBank: A Knowledge Bank of Situational Social Norms
Caleb Ziems
Jane Dwivedi-Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
18
41
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
18
45
0
26 May 2023
EXnet: Efficient In-context Learning for Data-less Text classification
Debaditya Shome
Kuldeep Yadav
12
1
0
24 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
29
6
0
21 May 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
22
14
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
209
570
0
03 May 2023
Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation
Natalia Díaz Rodríguez
Javier Del Ser
Mark Coeckelbergh
Marcos López de Prado
E. Herrera-Viedma
Francisco Herrera
XAI
27
262
0
02 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
113
56
0
01 May 2023
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
27
3
0
26 Apr 2023
SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Haolan Zhan
Zhuang Li
Yufei Wang
Linhao Luo
Tao Feng
...
Lay-Ki Soon
Suraj Sharma
Ingrid Zukerman
Zhaleh Semnani Azad
Gholamreza Haffari
49
16
0
24 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
24
126
0
06 Apr 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny P. L. Lo
AI4MH
LM&MA
40
127
0
21 Mar 2023
Towards the Scalable Evaluation of Cooperativeness in Language Models
Alan Chan
Maxime Riché
Jesse Clifton
LLMAG
12
6
0
16 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
33
99
0
09 Mar 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao-Lun Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
26
15
0
18 Feb 2023
Commonsense Reasoning for Conversational AI: A Survey of the State of the Art
Christopher Richardson
Larry Heck
LRM
22
8
0
15 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
E. Davis
ELM
LRM
19
57
0
09 Feb 2023
Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
Ruyuan Wan
Jaehyung Kim
Dongyeop Kang
9
36
0
12 Jan 2023
A Multi-Level Framework for the AI Alignment Problem
Betty Hou
Brian Patrick Green
14
6
0
10 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
90
34
0
01 Jan 2023
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
33
1
0
24 Dec 2022
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
Hao-Lun Sun
Zhexin Zhang
Fei Mi
Yasheng Wang
W. Liu
Jianwei Cui
Bin Wang
Qun Liu
Minlie Huang
29
19
0
21 Dec 2022
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations
Valentina Pyatkin
Jena D. Hwang
Vivek Srikumar
Ximing Lu
Liwei Jiang
Yejin Choi
Chandra Bhagavatula
24
33
0
20 Dec 2022
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
Joshua Albrecht
Ellie Kitanidis
Abraham J. Fetterman
ELM
ReLM
ALM
LRM
14
17
0
13 Dec 2022
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Seong-Oak Park
Suhong Moon
Jinkyu Kim
6
2
0
07 Dec 2022
Speaking Multiple Languages Affects the Moral Bias of Language Models
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Constantin Rothkopf
Alexander M. Fraser
Kristian Kersting
21
31
0
14 Nov 2022
Zero-shot Visual Commonsense Immorality Prediction
Yujin Jeong
Seongbeom Park
Suhong Moon
Jinkyu Kim
VLM
11
1
0
10 Nov 2022
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE
Yuling Gu
Yao Fu
Valentina Pyatkin
Ian H. Magnusson
Bhavana Dalvi
Peter Clark
70
7
0
28 Oct 2022
TAPE: Assessing Few-shot Russian Language Understanding
Ekaterina Taktasheva
Tatiana Shavrina
Alena Fenogenova
Denis Shevelev
Nadezhda Katricheva
...
Svetlana Iordanskaia
Alena Spiridonova
Valentina Kurenshchikova
Ekaterina Artemova
Vladislav Mikhailov
AAML
37
10
0
23 Oct 2022
Previous
1
2
3
4
5
6
7
Next