ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values
v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
ArXiv (abs)PDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown
Title
MALLM: Multi-Agent Large Language Models Framework
MALLM: Multi-Agent Large Language Models Framework
Jonas Becker
Lars Benedikt Kaesberg
Niklas Bauer
Jan Philip Wahle
Terry Ruas
Bela Gipp
LLMAG
232
2
0
15 Sep 2025
MillStone: How Open-Minded Are LLMs?
MillStone: How Open-Minded Are LLMs?
Harold Triedman
Vitaly Shmatikov
199
0
0
15 Sep 2025
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
Matteo Marcuzzo
A. Zangari
A. Albarelli
Jose Camacho-Collados
Mohammad Taher Pilehvar
197
3
0
15 Sep 2025
CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI
CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI
Hasin Jawad Ali
Ilhamul Azam
Ajwad Abrar
Md. Kamrul Hasan
H. Mahmud
84
0
0
14 Sep 2025
Murphys Laws of AI Alignment: Why the Gap Always Wins
Murphys Laws of AI Alignment: Why the Gap Always Wins
Madhava Gaikwad
ALM
213
1
0
04 Sep 2025
SoK: Large Language Model Copyright Auditing via Fingerprinting
SoK: Large Language Model Copyright Auditing via Fingerprinting
Shuo Shao
Yiming Li
Yexiao He
Hongwei Yao
Wenyuan Yang
D. Tao
Zhan Qin
319
4
0
27 Aug 2025
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap
Jun Wang
Ninglun Gu
Kailai Zhang
Zijiao Zhang
Yelun Bao
...
Liwei Liu
Yihuan Liu
Pengyong Li
Gary G. Yen
Junchi Yan
ALMELM
216
0
0
26 Aug 2025
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Zihao Huang
Yu Bao
Qiyang Min
S. Chen
Ran Guo
...
Defa Zhu
Yutao Zeng
Banggu Wu
Xun Zhou
Siyuan Qiao
MoE
156
3
0
26 Aug 2025
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi
Xiaojin Zhu
Yixuan Li
LRM
332
9
0
24 Aug 2025
Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens
Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens
Ilias Chalkidis
OffRLALM
140
1
0
23 Aug 2025
Political Ideology Shifts in Large Language Models
Political Ideology Shifts in Large Language Models
Pietro Bernardelle
Stefano Civelli
Leon Fröhling
Riccardo Lunardi
Kevin Roitero
Gianluca Demartini
104
1
0
22 Aug 2025
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Alessio Galatolo
Luca Alberto Rappuoli
Katie Winkle
Meriem Beloucif
ELM
130
1
0
18 Aug 2025
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie
Xurui Song
Jun Luo
152
5
0
17 Aug 2025
The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases
The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases
Emanuel Z. Fenech-Borg
Tilen P. Meznaric-Kos
Milica D. Lekovic-Bojovic
Arni J. Hentze-Djurhuus
232
0
0
17 Aug 2025
Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions
Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions
Leigh Levinson
Christopher J. Agostino
52
0
0
15 Aug 2025
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Monika Jotautaitė
Lucius Caviola
David A. Brewster
Thilo Hagendorff
148
0
0
15 Aug 2025
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Bin Hong
Jiayu Liu
Zhenya Huang
Kai Zhang
Mengdi Zhang
LRM
187
0
0
13 Aug 2025
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo
Shuo Shao
Su Zhang
Lijing Zhou
Yuke Hu
Chenxu Zhao
Zhihao Liu
Zhan Qin
212
4
0
13 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
219
4
0
11 Aug 2025
Sotopia-RL: Reward Design for Social Intelligence
Sotopia-RL: Reward Design for Social Intelligence
Haofei Yu
Zhengyang Qi
Yining Zhao
Kolby Nottingham
Keyang Xuan
Bodhisattwa Prasad Majumder
Hao Zhu
Paul Pu Liang
Jiaxuan You
OffRL
204
4
0
05 Aug 2025
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuanteng Chen
Yuantian Shao
Peisong Wang
Jian Cheng
MoE
153
2
0
03 Aug 2025
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Wenxuan Wang
Zizhan Ma
Meidan Ding
S. Zheng
Shengyuan Liu
...
Jiaming Ji
Wenting Chen
Xiang Li
LinLin Shen
Yixuan Yuan
LRM
186
4
0
01 Aug 2025
Model Misalignment and Language Change: Traces of AI-Associated Language in Unscripted Spoken English
Model Misalignment and Language Change: Traces of AI-Associated Language in Unscripted Spoken English
Bryce Anderson
Riley Galpin
Tom S. Juzek
208
2
0
01 Aug 2025
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Kwesi Cobbina
Tianyi Zhou
115
2
0
30 Jul 2025
IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
Aviya Maimon
Amir D. N. Cohen
Gal Vishne
Shauli Ravfogel
Reut Tsarfaty
124
0
0
27 Jul 2025
Diversity-Enhanced Reasoning for Subjective Questions
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang
Zhiyuan Fan
Jiayu Liu
J. Huang
Yi R. Fung
LRM
458
5
0
27 Jul 2025
Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics
Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics
Yongjie Li
Ruilin Nong
Jianan Liu
Lucas Evans
AI4Ed
138
2
0
25 Jul 2025
The Geometry of Harmfulness in LLMs through Subconcept Probing
The Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah
Saleena Angeline
Adhitya Rajendra Kumar
Naitik Chheda
Kevin Zhu
Sean O Brien
Sean O'Brien
Will Cai
LLMSV
195
3
0
23 Jul 2025
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems
Yizhe Xie
Congcong Zhu
X. Zhang
Tianqing Zhu
Dayong Ye
Minghao Wang
Chi Liu
LLMAG
160
2
0
07 Jul 2025
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Xujia Wang
Yunjia Qi
Bin Xu
221
0
0
06 Jul 2025
Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm
Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm
Baixiang Huang
Zhen Tan
Haoran Wang
Zijie Liu
Dawei Li
Ali Payani
Huan Liu
Tianlong Chen
Kai Shu
KELMLLMSV
249
0
0
25 Jun 2025
Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning
Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning
Duc Hieu Ho
Chenglin Fan
HILMLRM
163
1
0
19 Jun 2025
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Chenchen Yuan
Zheyu Zhang
Shuo Yang
Bardh Prenkaj
Gjergji Kasneci
250
1
0
17 Jun 2025
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
MoMe
348
1
0
17 Jun 2025
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada
Yusuke Yamauchi
Yusuke Oda
Yohei Oseki
Yusuke Miyao
Yu Takagi
ALM
226
5
0
17 Jun 2025
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs
Daniel Kilov
Caroline Hendy
Secil Yanik Guyot
Aaron J. Snoswell
Seth Lazar
ELM
279
2
0
16 Jun 2025
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
425
0
0
11 Jun 2025
MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory
MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory
Ana Carolina Condez
Diogo Tavares
João Magalhães
VLM
209
0
0
06 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
349
2
0
05 Jun 2025
Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning
Ho-Lam Chung
Teng-Yun Hsiao
Hsiao-Ying Huang
Chunerh Cho
Jian-Ren Lin
Zhang Ziwei
Yun-Nung Chen
LRM
338
4
0
05 Jun 2025
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
Ali Asad
Stephen Obadinma
Radin Shayanfar
Xiaodan Zhu
AAMLLLMAG
239
3
0
04 Jun 2025
GEM: Empowering LLM for both Embedding Generation and Language Understanding
Caojin Zhang
Qiang Zhang
Ke Li
Sai Vidyaranya Nuthalapati
Benyu Zhang
Jason Liu
Serena Li
Lizhu Zhang
Xiangjun Fan
152
2
0
04 Jun 2025
VM14K: First Vietnamese Medical Benchmark
VM14K: First Vietnamese Medical Benchmark
T. Nguyen
Duc Duy Nguyen
Minh Dang
Thai Dao
L. T. Nguyen
Quan H. Nguyen
D. Q. Nguyen
Kien Tran
M. Tran
ELM
194
0
0
02 Jun 2025
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Bumjin Park
Jinsil Lee
Jaesik Choi
195
0
0
01 Jun 2025
Large Language Models Often Know When They Are Being Evaluated
Large Language Models Often Know When They Are Being Evaluated
Joe Needham
Giles Edkins
Govind Pimpale
Henning Bartsch
Marius Hobbhahn
LLMAGELMALM
332
22
0
28 May 2025
Advancing Expert Specialization for Better MoE
Advancing Expert Specialization for Better MoE
Hongcan Guo
Haolang Lu
Guoshun Nan
Bolun Chu
Jialin Zhuang
Yuan Yang
Wenhao Che
Sicong Leng
Qimei Cui
Xudong Jiang
MoEMoMe
341
7
0
28 May 2025
Are Language Models Consequentialist or Deontological Moral Reasoners?
Are Language Models Consequentialist or Deontological Moral Reasoners?
Keenan Samway
Max Kleiman-Weiner
David Guzman Piedrahita
Amélie Reymond
Bernhard Schölkopf
Zhijing Jin
ELMLRM
180
3
0
27 May 2025
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
Kai Chen
Zihao He
Taiwei Shi
Kristina Lerman
ALMLLMSV
299
3
0
27 May 2025
Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models
Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Injae Na
Keonwoong Noh
Woohwan Jung
241
0
0
27 May 2025
Efficient Data Selection at Scale via Influence Distillation
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan
Vincent Cohen-Addad
Dan Alistarh
Vahab Mirrokni
TDI
301
4
0
25 May 2025
Previous
12345...8910
Next