Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.02275
Cited By
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
J. Li
D. Song
Jacob Steinhardt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 347 papers shown
Title
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
Liesbeth Allein
Maria Mihaela Trucscva
Marie-Francine Moens
20
1
0
27 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
25
10
0
18 Nov 2023
MOKA: Moral Knowledge Augmentation for Moral Event Extraction
Xinliang Frederick Zhang
Winston Wu
Nick Beauchamp
Lu Wang
35
7
0
16 Nov 2023
LifeTox: Unveiling Implicit Toxicity in Life Advice
Minbeom Kim
Jahyun Koo
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
8
6
0
16 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
29
27
0
15 Nov 2023
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
24
27
0
15 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
22
21
0
15 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
14
6
0
13 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAML
LRM
19
93
0
13 Nov 2023
Online Advertisements with LLMs: Opportunities and Challenges
S. Feizi
Mohammadtaghi Hajiaghayi
Keivan Rezaei
Suho Shin
OffRL
11
10
0
11 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David A. Clifton
LM&MA
31
106
0
09 Nov 2023
Mini Minds: Exploring Bebeshka and Zlata Baby Models
Irina Proskurina
Guillaume Metzler
Julien Velcin
ALM
19
1
0
06 Nov 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David A. Wagner
ALM
18
26
0
06 Nov 2023
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
18
33
0
30 Oct 2023
EtiCor: Corpus for Analyzing LLMs for Etiquettes
Ashutosh Dwivedi
Pradhyumna Lavania
Ashutosh Modi
15
19
0
29 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
8
8
0
24 Oct 2023
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding
Xiao-Yu Guo
Yuan-Fang Li
Gholamreza Haffari
17
1
0
24 Oct 2023
An In-Context Schema Understanding Method for Knowledge Base Question Answering
Yantao Liu
Zixuan Li
Xiaolong Jin
Yucan Guo
Long Bai
Saiping Guan
Jiafeng Guo
Xueqi Cheng
22
1
0
22 Oct 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
Karina Vida
Judith Simon
Anne Lauscher
13
17
0
21 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
T. Lu
Xing Xie
Ning Gu
16
9
0
17 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
13
2
0
16 Oct 2023
Is Certifying
ℓ
p
\ell_p
ℓ
p
Robustness Still Worthwhile?
Ravi Mangal
Klas Leino
Zifan Wang
Kai Hu
Weicheng Yu
Corina S. Pasareanu
Anupam Datta
Matt Fredrikson
AAML
OOD
25
1
0
13 Oct 2023
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception
Harsh Kumar
Ilya Musabirov
Mohi Reza
Jiakai Shi
Xinyuan Wang
Joseph Jay Williams
Anastasia Kuzminykh
Michael Liut
19
29
0
13 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
19
41
0
11 Oct 2023
Case Law Grounding: Aligning Judgments of Humans and AI on Socially-Constructed Concepts
Quan Ze Chen
Amy X. Zhang
ELM
56
6
0
10 Oct 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
27
12
0
09 Oct 2023
STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI Models
Yuwei Wang
Enmeng Lu
Zizhe Ruan
Yao Liang
Yi Zeng
AI4TS
24
4
0
09 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Muhammad Ahmed Shah
Roshan S. Sharma
Hira Dhamyal
R. Olivier
Ankit Shah
...
Massa Baali
Soham Deshmukh
Michael Kuhlmann
Bhiksha Raj
Rita Singh
AAML
25
19
0
02 Oct 2023
EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval
Yiyao Yu
Junjie Wang
Yuxiang Zhang
Lin Zhang
Yujiu Yang
Tetsuya Sakai
25
1
0
02 Oct 2023
ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models
Zhaowei Zhang
Fengshuo Bai
Jun Gao
Yaodong Yang
PILM
ELM
10
3
0
30 Sep 2023
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun
Zhangyue Yin
Xiang Li
Zhiyong Wu
Xipeng Qiu
Lingpeng Kong
LRM
LLMAG
22
44
0
30 Sep 2023
The Confidence-Competence Gap in Large Language Models: A Cognitive Study
Aniket Kumar Singh
Suman Devkota
Bishal Lamichhane
Uttam Dhakal
Chandra Dhakal
LRM
24
9
0
28 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
14
177
0
26 Sep 2023
Probing the Moral Development of Large Language Models through Defining Issues Test
Kumar Tanmay
Aditi Khandelwal
Utkarsh Agarwal
Monojit Choudhury
LRM
6
14
0
23 Sep 2023
On the Relationship between Skill Neurons and Robustness in Prompt Tuning
Leon Ackermann
Xenia Ohmer
AAML
16
0
0
21 Sep 2023
An Evaluation of GPT-4 on the ETHICS Dataset
Sergey Rodionov
Z. Goertzel
Ben Goertzel
19
4
0
19 Sep 2023
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning
Rajasekhar Reddy Mekala
Yasaman Razeghi
Sameer Singh
LRM
16
9
0
16 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
26
13
0
13 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Minlie Huang
LRM
LM&MA
ELM
29
89
0
13 Sep 2023
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education
M. Abedi
Ibrahem Alshybani
M. Shahadat
M. Murillo
28
13
0
09 Sep 2023
Gesture-Informed Robot Assistance via Foundation Models
Li-Heng Lin
Yuchen Cui
Yilun Hao
Fei Xia
Dorsa Sadigh
LM&Ro
SLR
13
19
0
06 Sep 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen
Liwei Jiang
Jena D. Hwang
Sydney Levine
Valentina Pyatkin
...
Kavel Rao
Chandra Bhagavatula
Maarten Sap
J. Tasioulas
Yejin Choi
SLR
16
50
0
02 Sep 2023
Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Physical Systems
Sydney Pugh
I. Ruchkin
Insup Lee
James Weimer
AAML
OOD
11
0
0
01 Sep 2023
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
Weirui Kuang
Bingchen Qian
Zitao Li
Daoyuan Chen
Dawei Gao
Xuchen Pan
Yuexiang Xie
Yaliang Li
Bolin Ding
Jingren Zhou
FedML
8
112
0
01 Sep 2023
Is the U.S. Legal System Ready for AI's Challenges to Human Values?
Inyoung Cheong
Aylin Caliskan
Tadayoshi Kohno
SILM
ELM
AILaw
16
1
0
30 Aug 2023
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Jingyan Zhou
Minda Hu
Junan Li
Xiaoying Zhang
Xixin Wu
Irwin King
Helen M. Meng
LRM
42
24
0
29 Aug 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
25
139
0
28 Aug 2023
The Poison of Alignment
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
20
9
0
25 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
14
42
0
23 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
17
127
0
18 Aug 2023
Previous
1
2
3
4
5
6
7
Next