Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2008.02275
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Aligning AI With Shared Human Values
5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning AI With Shared Human Values"
50 / 463 papers shown
Conversations: Love Them, Hate Them, Steer Them
Niranjan Chebrolu
Gerard Christopher Yeo
Kokil Jaidka
144
0
0
23 May 2025
Unveiling the Basin-Like Loss Landscape in Large Language Models
Huanran Chen
Yinpeng Dong
Zeming Wei
Yao Huang
Yichi Zhang
Hang Su
Jun Zhu
MoMe
433
5
0
23 May 2025
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
Ya Wu
Qiang Sheng
Danding Wang
Guang Yang
Yifan Sun
Zhengjia Wang
Yuyan Bu
Juan Cao
202
4
0
23 May 2025
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány
Stefan Balauca
Robin Staab
Dimitar I. Dimitrov
Martin Vechev
AAML
303
1
0
22 May 2025
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
301
2
0
22 May 2025
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
Maryam Dialameh
Rezaul Karim
Hossein Rajabzadeh
Omar Mohamed Awad
Hyock Ju Kwon
Boxing Chen
Walid Ahmed
Yang Liu
292
2
0
22 May 2025
Cost-aware LLM-based Online Dataset Annotation
Eray Can Elumar
Cem Tekin
Osman Yagan
257
1
0
21 May 2025
Kaleidoscope Gallery: Exploring Ethics and Generative AI Through Art
Creativity & Cognition (C&C), 2025
Alayt Issak
Uttkarsh Narayan
Ramya Srinivasan
Erica Kleinman
Casper Harteveld
219
0
0
20 May 2025
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Li Ji-An
Hua-Dong Xiong
Robert C. Wilson
Marcelo G. Mattar
M. Benna
352
13
0
19 May 2025
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
392
11
0
17 May 2025
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Ben Yao
Qiuchi Li
Yazhou Zhang
Siyu Yang
Bohan Zhang
Prayag Tiwari
Jing Qin
349
0
0
13 May 2025
Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge
Vytenis Šliogeris
Povilas Daniušis
Arturas Nakvosas
CLL
218
0
0
09 May 2025
Advancing and Benchmarking Personalized Tool Invocation for LLMs
Xiaolin Huang
Yuefeng Huang
Wen Liu
Xingshan Zeng
Yijiao Wang
Ruiming Tang
Hong Xie
Defu Lian
232
2
0
07 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Hyunsei Lee
Junyoung Lee
Shinhyoung Jang
Il hong Suh
Yeseong Kim
305
1
0
01 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Qi Zhang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
548
23
0
26 Apr 2025
Auditing the Ethical Logic of Generative AI Models
W. Russell Neuman
Chad Coleman
Ali Dasdan
Safinah Ali
Manan Shah
ELM
LRM
277
4
0
24 Apr 2025
The Digital Cybersecurity Expert: How Far Have We Come?
IEEE Symposium on Security and Privacy (S&P), 2025
Dawei Wang
Geng Zhou
Xianglong Li
Yu Bai
Li Chen
Ting Qin
Jian Sun
Didong Li
ELM
293
1
0
16 Apr 2025
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Ayoung Lee
Ryan Sungmo Kwon
Peter Railton
Lu Wang
ELM
497
3
0
15 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
460
1
0
14 Apr 2025
Visual moral inference and communication
Warren Zhu
Aida Ramezani
Yang Xu
150
1
0
12 Apr 2025
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
Lv Qingsong
Yangning Li
Zihua Lan
Zishan Xu
Jiwei Tang
...
Wenhao Jiang
Wanshi Xu
Philip S. Yu
Hai-Tao Zheng
Philip S. Yu
562
2
0
09 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Jian Shu
Jiang Wu
Ting Liu
AAML
231
1
0
08 Apr 2025
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
342
1
0
05 Apr 2025
Entropy-Based Block Pruning for Efficient Large Language Models
Liangwei Yang
Yuhui Xu
Juntao Tan
Doyen Sahoo
Siyang Song
Caiming Xiong
Han Wang
Shelby Heinecke
AAML
212
0
0
04 Apr 2025
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
A. Myntti
Erik Henriksson
Veronika Laippala
S. Pyysalo
292
1
0
02 Apr 2025
From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
Marcely Zanon Boito
423
3
0
13 Mar 2025
Backtracking for Safety
Bilgehan Sel
Dingcheng Li
Phillip Wallis
Vaishakh Keshava
Ming Jin
Siddhartha Reddy Jonnalagadda
KELM
241
2
0
11 Mar 2025
Stay Focused: Problem Drift in Multi-Agent Debate
Jonas Becker
Lars Benedikt Kaesberg
Andreas Stephan
Jan Philip Wahle
Terry Ruas
Bela Gipp
469
8
0
26 Feb 2025
Speaking the Right Language: The Impact of Expertise Alignment in User-AI Interactions
Shramay Palta
Nirupama Chandrasekaran
Rachel Rudinger
Scott Counts
252
1
0
25 Feb 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
348
44
0
23 Feb 2025
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Shivani Kumar
David Jurgens
LRM
299
5
0
21 Feb 2025
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
Abdelrahman Abdallah
Bhawna Piryani
Jamshid Mozafari
Mohammed Ali
Adam Jatowt
938
5
0
21 Feb 2025
Self-Taught Agentic Long Context Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yufan Zhuang
Xiaodong Yu
Jialian Wu
Xingwu Sun
Zihan Wang
Jiang Liu
Yusheng Su
Jingbo Shang
Zicheng Liu
Emad Barsoum
LRM
338
2
0
21 Feb 2025
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
M. Russinovich
Ahmed Salem
CLL
MU
372
4
0
20 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
International Conference on Human Factors in Computing Systems (CHI), 2025
Shreyan Biswas
Alexander Erlei
U. Gadiraju
402
6
0
13 Feb 2025
The Odyssey of the Fittest: Can Agents Survive and Still Be Good?
Dylan Waldner
Risto Miikkulainen
400
2
0
08 Feb 2025
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
Yue Huang
Zixiang Xu
Zixiang Xu
Chujie Gao
Siyuan Wu
Jiayi Ye
Preslav Nakov
Pin-Yu Chen
Wei Wei
AAML
263
5
0
03 Feb 2025
Normative Evaluation of Large Language Models with Everyday Moral Dilemmas
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Pratik S. Sachdeva
Tom van Nuenen
ELM
191
12
0
30 Jan 2025
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingyu Derek Ma
Yanna Ding
Zijie Huang
Jianxi Gao
Tony Nowatzki
Wei Wang
210
1
0
28 Jan 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Yibin Wang
Haizhou Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDL
UQLM
714
22
0
28 Jan 2025
The Goofus & Gallant Story Corpus for Practical Value Alignment
International Conference on Machine Learning and Applications (ICMLA), 2024
Md Sultan al Nahian
Tasmia Tasrin
Spencer Frazier
Mark O. Riedl
Brent Harrison
230
0
0
17 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
1.3K
7
0
15 Jan 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen
Manish Shetty
Gagan Somashekar
Minghua Ma
Yogesh L. Simmhan
Jonathan Mace
Chetan Bansal
Rujia Wang
Saravan Rajmohan
312
18
0
12 Jan 2025
M
3
^3
3
oralBench: A MultiModal Moral Benchmark for LVLMs
Bei Yan
Jie M. Zhang
Zhiyuan Chen
Shiguang Shan
Xilin Chen
ELM
279
6
0
31 Dec 2024
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp
Akiko Aizawa
Eiji Aramaki
Bowen Chen
Fei Cheng
...
Yuya Yamamoto
Yusuke Yamauchi
Hitomi Yanaka
Rio Yokota
Koichiro Yoshino
266
24
0
31 Dec 2024
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham
Hoang-Nam Le
Phu-Vinh Nguyen
Chris Ngo
Truong-Son Hy
AuLLM
LRM
265
1
0
21 Dec 2024
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
Yuxi Sun
Wei Gao
Jing Ma
Hongzhan Lin
Ziyang Luo
Wenxuan Zhang
ELM
403
0
0
17 Dec 2024
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor
Ashwin Baluja
168
8
0
01 Dec 2024
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
554
2
0
30 Nov 2024
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Abhinav Joshi
Shaswati Saha
Divyaksh Shukla
Sriram Vema
Harsh Jhamtani
Manas Gaur
Ashutosh Modi
MU
266
7
0
23 Nov 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next