Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
MERA: A Comprehensive LLM Evaluation in Russian
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Sergey Petrakov
Sergey Markov
ELM
274
31
0
09 Jan 2024
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Yuqing Wang
Yun Zhao
VLM
ReLM
LRM
313
26
0
29 Dec 2023
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
Yuhao Chen
Chloe Wong
Hanwen Yang
Juan Aguenza
Sai Bhujangari
...
Eric Phuong
Minghao Liu
Raja Kumar
Vanshika Vats
James Davis
341
1
0
22 Dec 2023
Learning Human-like Representations to Enable Learning Human Values
Andrea Wynn
Ilia Sucholutsky
Thomas Griffiths
271
7
0
21 Dec 2023
ALMANACS: A Simulatability Benchmark for Language Model Explainability
Edmund Mills
Shiye Su
Stuart J. Russell
Scott Emmons
505
9
0
20 Dec 2023
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Dirk Groeneveld
Anas Awadalla
Iz Beltagy
Akshita Bhagia
Ian H. Magnusson
Hao Peng
Oyvind Tafjord
Pete Walsh
Kyle Richardson
Jesse Dodge
265
2
0
15 Dec 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
International Conference on Machine Learning (ICML), 2023
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
...
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
361
387
0
14 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
International Conference on Learning Representations (ICLR), 2023
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
777
29
0
13 Dec 2023
SM70: A Large Language Model for Medical Devices
Anubhav Bhatti
Surajsinh Parmar
San Lee
LM&MA
AI4MH
68
2
0
12 Dec 2023
Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy
Devin Gonier
Adrian Adduci
Cassidy LoCascio
151
0
0
10 Dec 2023
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
International Conference on Learning Representations (ICLR), 2023
Renze Lou
Kai Zhang
Jian Xie
Yuxuan Sun
Janice Ahn
Hanzi Xu
Yu Su
Wenpeng Yin
265
36
0
05 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Neural Information Processing Systems (NeurIPS), 2023
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
354
449
0
04 Dec 2023
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
P. Bricman
182
0
0
01 Dec 2023
Foundational Moral Values for AI Alignment
Betty Hou
Brian Patrick Green
177
1
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
456
23
0
28 Nov 2023
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments
Liesbeth Allein
Maria Mihaela Trucscva
Marie-Francine Moens
211
2
0
27 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
167
13
0
18 Nov 2023
MOKA: Moral Knowledge Augmentation for Moral Event Extraction
Xinliang Frederick Zhang
Winston Wu
Nick Beauchamp
Lu Wang
250
12
0
16 Nov 2023
LifeTox: Unveiling Implicit Toxicity in Life Advice
Minbeom Kim
Jahyun Koo
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
307
11
0
16 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
267
42
0
15 Nov 2023
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Yunjia Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
258
43
0
15 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
299
40
0
15 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
365
7
0
13 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAML
LRM
212
150
0
13 Nov 2023
Online Advertisements with LLMs: Opportunities and Challenges
Soheil Feizi
Mohammadtaghi Hajiaghayi
Keivan Rezaei
Suho Shin
OffRL
409
22
0
11 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David Clifton
LM&MA
736
187
0
09 Nov 2023
Mini Minds: Exploring Bebeshka and Zlata Baby Models
Irina Proskurina
Guillaume Metzler
Julien Velcin
ALM
164
1
0
06 Nov 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
363
43
0
06 Nov 2023
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Neural Information Processing Systems (NeurIPS), 2023
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
273
55
0
30 Oct 2023
Moral Sparks in Social Media Narratives
ACM Conference on Hypertext & Social Media (HT), 2023
Ruijie Xi
Munindar P. Singh
LRM
252
2
0
30 Oct 2023
EtiCor: Corpus for Analyzing LLMs for Etiquettes
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ashutosh Dwivedi
Pradhyumna Lavania
Ashutosh Modi
181
35
0
29 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
187
14
0
24 Oct 2023
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiao-Yu Guo
Yuan-Fang Li
Gholamreza Haffari
236
7
0
24 Oct 2023
An In-Context Schema Understanding Method for Knowledge Base Question Answering
Knowledge Science, Engineering and Management (KSEM), 2023
Yantao Liu
Zixuan Li
Xiaolong Jin
Yucan Guo
Long Bai
Saiping Guan
Jiafeng Guo
Xueqi Cheng
199
3
0
22 Oct 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Karina Vida
Judith Simon
Anne Lauscher
243
21
0
21 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
International Conference on Learning Representations (ICLR), 2023
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
237
24
0
17 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
178
5
0
16 Oct 2023
Is Certifying
ℓ
p
\ell_p
ℓ
p
Robustness Still Worthwhile?
Ravi Mangal
Klas Leino
Zifan Wang
Kai Hu
Weicheng Yu
Corina S. Pasareanu
Anupam Datta
Matt Fredrikson
AAML
OOD
250
1
0
13 Oct 2023
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception
Harsh Kumar
Ilya Musabirov
Mohi Reza
Jiakai Shi
Xinyuan Wang
Joseph Jay Williams
Anastasia Kuzminykh
Michael Liut
220
43
0
13 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
361
64
0
11 Oct 2023
Case Law Grounding: Aligning Judgments of Humans and AI on Socially-Constructed Concepts
International Conference on Climate Informatics (ICCI), 2023
Quan Ze Chen
Amy X. Zhang
ELM
282
6
0
10 Oct 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Neural Information Processing Systems (NeurIPS), 2023
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
343
34
0
09 Oct 2023
STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI Models
Ai & Society (AI & Society), 2023
Yuwei Wang
Enmeng Lu
Zizhe Ruan
Yao Liang
Yi Zeng
AI4TS
210
5
0
09 Oct 2023
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
Muhammad Ahmed Shah
Roshan S. Sharma
Hira Dhamyal
R. Olivier
Ankit Shah
...
Massa Baali
Soham Deshmukh
Michael Kuhlmann
Bhiksha Raj
Rita Singh
AAML
132
24
0
02 Oct 2023
EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval
Yiyao Yu
Junjie Wang
Yuxiang Zhang
Lin Zhang
Yujiu Yang
Tetsuya Sakai
162
2
0
02 Oct 2023
ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models
Zhaowei Zhang
Fengshuo Bai
Jun Gao
Yaodong Yang
PILM
ELM
326
5
0
30 Sep 2023
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun
Zhangyue Yin
Xiang Li
Zhiyong Wu
Xipeng Qiu
Lingpeng Kong
LRM
LLMAG
395
69
0
30 Sep 2023
The Confidence-Competence Gap in Large Language Models: A Cognitive Study
Aniket Kumar Singh
Suman Devkota
Bishal Lamichhane
Uttam Dhakal
Chandra Dhakal
LRM
230
13
0
28 Sep 2023
Large Language Model Alignment: A Survey
Shangda Wu
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
359
282
0
26 Sep 2023
Probing the Moral Development of Large Language Models through Defining Issues Test
Kumar Tanmay
Aditi Khandelwal
Utkarsh Agarwal
Monojit Choudhury
LRM
246
27
0
23 Sep 2023
Previous
1
2
3
...
10
6
7
8
9
Next