Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Guosheng Dong
ELM
445
43
0
02 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
272
48
0
31 Jul 2024
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Camilla Bignotti
C. Camassa
AILaw
ELM
257
6
0
29 Jul 2024
Blockchain for Large Language Model Security and Safety: A Holistic Survey
Caleb Geren
Amanda Board
Gaby G. Dagher
Tim Andersen
Jun Zhuang
270
18
0
26 Jul 2024
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
Zihui Wu
Haichang Gao
Jianping He
Ping Wang
375
17
0
25 Jul 2024
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu
Yishuo Cai
Zhenhong Zhou
Renjie Gu
Haiqin Weng
Yan Liu
Tianwei Zhang
Wei Xu
Han Qiu
206
13
0
23 Jul 2024
Virtue Ethics For Ethically Tunable Robotic Assistants
Rajitha Ramanayake
Vivek Nallur
95
0
0
23 Jul 2024
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
233
48
0
22 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Chenyang Xi
Zifan Zheng
Ding Chen
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILM
LRM
506
46
0
19 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
464
2
0
16 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
294
23
0
15 Jul 2024
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
M. Russinovich
Ahmed Salem
437
40
0
15 Jul 2024
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
310
27
0
12 Jul 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
291
56
0
08 Jul 2024
Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"
Ben Fitzgerald
168
0
0
07 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
387
37
0
06 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
Jimmy Huang
ELM
ALM
283
92
0
04 Jul 2024
Multilingual Trolley Problems for Language Models
Zhijing Jin
Sydney Levine
Max Kleiman-Weiner
Giorgio Piatti
Jiarui Liu
...
András Strausz
Mrinmaya Sachan
Amélie Reymond
Yejin Choi
Bernhard Schölkopf
LRM
356
0
0
02 Jul 2024
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
Nishant Balepur
Rachel Rudinger
196
11
0
02 Jul 2024
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Yalan Qin
Yaodong Yang
AI4TS
286
9
0
28 Jun 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Boyao Wang
Dylan Zhang
Hanning Zhang
Xingyuan Pan
Minrui Xu
Jipeng Zhang
Renjie Pi
Xiaoyu Wang
Tong Zhang
432
24
0
28 Jun 2024
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Yue Guo
Yi Yang
226
15
0
27 Jun 2024
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha
Arash Ahmadian
Beyza Ermis
Seraphina Goldfarb-Tarrant
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
372
54
0
26 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
229
24
0
25 Jun 2024
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
Yuu Jinnai
342
8
0
24 Jun 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Hasan Hammoud
Umberto Michieli
Fabio Pizzati
Juil Sock
Adel Bibi
Guohao Li
Mete Ozay
MoMe
280
33
0
20 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
328
10
0
20 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
652
12
0
20 Jun 2024
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
Sagnik Mukherjee
Muhammad Farid Adilazuarda
Sunayana Sitaram
Kalika Bali
Alham Fikri Aji
Monojit Choudhury
280
22
0
17 Jun 2024
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Bolei Ma
Xinpeng Wang
Tiancheng Hu
Anna Haensch
Michael A. Hedderich
Barbara Plank
Frauke Kreuter
ALM
301
19
0
16 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
258
6
0
16 Jun 2024
Toward Optimal LLM Alignments Using Two-Player Games
Rui Zheng
Hongyi Guo
Zhihan Liu
Xiaoying Zhang
Yuanshun Yao
...
Tao Gui
Qi Zhang
Xuanjing Huang
Hang Li
Yang Liu
278
12
0
16 Jun 2024
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity
Tam n. Nguyen
ELM
209
4
0
11 Jun 2024
Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain
Brian Hu
Bill Ray
Alice Leung
Amy Summerville
David Joy
Christopher Funk
Arslan Basharat
299
14
0
10 Jun 2024
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
279
307
0
06 Jun 2024
MoralBench: Moral Evaluation of LLMs
Jianchao Ji
Yutong Chen
Haoyang Ling
Wujiang Xu
Qingfeng Lan
Yongfeng Zhang
ELM
351
30
0
06 Jun 2024
Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand?
Shu Zhong
Elia Gatti
Youngjun Cho
Marianna Obrist
188
5
0
05 Jun 2024
Are Large Language Models Chameleons?
Mingmeng Geng
Sihong He
Roberto Trotta
202
0
0
29 May 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
252
21
0
28 May 2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Chengxing Jia
Pengyuan Wang
Ziniu Li
Yi-Chen Li
Zhilong Zhang
Nan Tang
Yang Yu
OffRL
254
2
0
27 May 2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro
Nadav Merlis
Nir Weinberger
Shie Mannor
530
1
0
26 May 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Shiyang Feng
Jiaming Song
231
13
0
25 May 2024
Instruction Tuning With Loss Over Instructions
Neural Information Processing Systems (NeurIPS), 2024
Zhengyan Shi
Adam X. Yang
Bin Wu
Laurence Aitchison
Emine Yilmaz
Aldo Lipani
ALM
288
36
0
23 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Neural Information Processing Systems (NeurIPS), 2024
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
359
44
0
23 May 2024
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
Giada Pistilli
Alina Leidinger
Yacine Jernite
Atoosa Kasirzadeh
A. Luccioni
Margaret Mitchell
316
8
0
22 May 2024
Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling
Yibo Wang
Yuanyuan Mao
Shi-ting Ni
180
0
0
22 May 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
449
147
0
22 May 2024
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Bilgehan Sel
Priya Shanmugasundaram
Mohammad Kachuee
Kun Zhou
Ruoxi Jia
Ming Jin
LRM
283
11
0
21 May 2024
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions
Chuanneng Sun
Songjun Huang
D. Pompili
LLMAG
345
63
0
17 May 2024
Facilitating Opinion Diversity through Hybrid NLP Approaches
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Michiel van der Meer
318
3
0
15 May 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next
Page 5 of 10