Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
On the Relationship between Skill Neurons and Robustness in Prompt Tuning
International Conference on Language Resources and Evaluation (LREC), 2023
Leon Ackermann
Xenia Ohmer
AAML
164
0
0
21 Sep 2023
An Evaluation of GPT-4 on the ETHICS Dataset
Sergey Rodionov
Z. Goertzel
Ben Goertzel
126
6
0
19 Sep 2023
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Rajasekhar Reddy Mekala
Yasaman Razeghi
Sameer Singh
LRM
337
16
0
16 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
180
19
0
13 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Shiyu Huang
LRM
LM&MA
ELM
304
169
0
13 Sep 2023
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education
M. Abedi
Ibrahem Alshybani
M. Shahadat
M. Murillo
312
21
0
09 Sep 2023
Gesture-Informed Robot Assistance via Foundation Models
Conference on Robot Learning (CoRL), 2023
Li-Heng Lin
Yuchen Cui
Yilun Hao
Fei Xia
Dorsa Sadigh
LM&Ro
SLR
154
27
0
06 Sep 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
AAAI Conference on Artificial Intelligence (AAAI), 2023
Taylor Sorensen
Liwei Jiang
Jena D. Hwang
Sydney Levine
Valentina Pyatkin
...
Kavel Rao
Chandra Bhagavatula
Maarten Sap
J. Tasioulas
Yejin Choi
SLR
492
90
0
02 Sep 2023
Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Physical Systems
Sydney Pugh
I. Ruchkin
Insup Lee
James Weimer
AAML
OOD
196
0
0
01 Sep 2023
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
Knowledge Discovery and Data Mining (KDD), 2023
Weirui Kuang
Bingchen Qian
Zitao Li
Daoyuan Chen
Dawei Gao
Xuchen Pan
Yuexiang Xie
Yaliang Li
Bolin Ding
Jingren Zhou
FedML
322
200
0
01 Sep 2023
Is the U.S. Legal System Ready for AI's Challenges to Human Values?
Inyoung Cheong
Aylin Caliskan
Tadayoshi Kohno
SILM
ELM
AILaw
275
3
0
30 Aug 2023
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Jingyan Zhou
Minda Hu
Junan Li
Xiaoying Zhang
Xixin Wu
Irwin King
Helen M. Meng
LRM
278
38
0
29 Aug 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
309
241
0
28 Aug 2023
The Poison of Alignment
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
127
10
0
25 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yongfeng Zhang
Xing Xie
ALM
393
56
0
23 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
386
214
0
18 Aug 2023
MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models
Mohd Zaki
J. Jayadeva
Mausam
N. M. A. Krishnan
ELM
173
8
0
17 Aug 2023
FLIRT: Feedback Loop In-context Red Teaming
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ninareh Mehrabi
Palash Goyal
Christophe Dupuy
Qian Hu
Shalini Ghosh
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
DiffM
250
87
0
08 Aug 2023
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
Y. Kang
Jihan Kim
AI4CE
LLMAG
209
16
0
01 Aug 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
International Journal of Computer Vision (IJCV), 2023
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
309
13
0
31 Jul 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
623
2,304
0
27 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Neural Information Processing Systems (NeurIPS), 2023
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
247
204
0
26 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
International Conference on Machine Learning (ICML), 2023
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
225
74
0
17 Jul 2023
A Survey on Evaluation of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
700
2,769
0
06 Jul 2023
Minimum Levels of Interpretability for Artificial Moral Agents
AI and Ethics (AE), 2023
Avish Vijayaraghavan
C. Badea
AI4CE
163
6
0
02 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
360
337
0
28 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELM
ReLM
LRM
169
17
0
25 Jun 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
136
0
0
22 Jun 2023
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory
Masashi Takeshita
Rafal Rzepka
K. Araki
187
11
0
20 Jun 2023
Toward Grounded Commonsense Reasoning
IEEE International Conference on Robotics and Automation (ICRA), 2023
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&Ro
ReLM
LRM
271
15
0
14 Jun 2023
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
198
2
0
05 Jun 2023
Knowledge of cultural moral norms in large language models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Aida Ramezani
Yang Xu
ELM
AILaw
187
65
0
02 Jun 2023
Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?
Bonan Kou
Shengmai Chen
Zhijie Wang
Lei Ma
Tianyi Zhang
ALM
229
20
0
02 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq Joty
J. Huang
LM&MA
ELM
ALM
500
215
0
29 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
360
34
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice Oh
San-hee Park
Jung-Woo Ha
216
18
0
28 May 2023
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Neural Information Processing Systems (NeurIPS), 2023
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh Chawla
Olaf Wiest
Xiangliang Zhang
ELM
518
210
0
27 May 2023
NormBank: A Knowledge Bank of Situational Social Norms
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Caleb Ziems
Jane Dwivedi-Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
337
56
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
International Conference on Learning Representations (ICLR), 2023
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
285
88
0
26 May 2023
EXnet: Efficient In-context Learning for Data-less Text classification
Debaditya Shome
Kuldeep Yadav
141
3
0
24 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
International Conference on Language Resources and Evaluation (LREC), 2023
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Amélie Reymond
LRM
320
8
0
21 May 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
198
15
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
585
851
0
03 May 2023
Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation
Information Fusion (Inf. Fusion), 2023
Natalia Díaz Rodríguez
Javier Del Ser
Mark Coeckelbergh
Marcos López de Prado
E. Herrera-Viedma
Francisco Herrera
XAI
339
455
0
02 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
Marcely Zanon Boito
ALM
304
69
0
01 May 2023
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
222
3
0
26 Apr 2023
SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Haolan Zhan
Zhuang Li
Yufei Wang
Linhao Luo
Tao Feng
...
Lay-Ki Soon
Suraj Sharma
Ingrid Zukerman
Zhaleh Semnani Azad
Gholamreza Haffari
224
23
0
24 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
International Conference on Machine Learning (ICML), 2023
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
533
166
0
06 Apr 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
280
184
0
21 Mar 2023
Towards the Scalable Evaluation of Cooperativeness in Language Models
Alan Chan
Maxime Riché
Jesse Clifton
LLMAG
242
8
0
16 Mar 2023
Previous
1
2
3
...
10
7
8
9
Next