ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values
v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
ArXiv (abs)PDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown
Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions
Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions
Kazi Abrab Hossain
Jannatul Somiya Mahmud
Maria Hossain Tuli
Anik Mitra
S. M. Taiabul Haque
Farig Y. Sadeque
106
0
0
03 Dec 2025
Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
Shi Ding
Brian Magerko
ELM
253
0
0
28 Nov 2025
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante
Md Mokarram Chowdhury
Yang Li
89
0
0
27 Nov 2025
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Guanxi Lu
Hao Mark Chen
Zhiqiang Que
Wayne Luk
Hongxiang Fan
MQ
127
0
0
27 Nov 2025
FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
M. Fatehkia
Enes Altinisik
Husrev Taha Sencar
108
0
0
24 Nov 2025
PoETa v2: Toward More Robust Evaluation of Large Language Models in Portuguese
PoETa v2: Toward More Robust Evaluation of Large Language Models in PortugueseIEEE Access (IEEE Access), 2025
Thales Sales Almeida
Ramon Pires
Hugo Queiroz Abonizio
Rodrigo Nogueira
Hélio Pedrini
70
1
0
21 Nov 2025
Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis
Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis
Haijiang Liu
Jinguang Gu
Xun Wu
Daniel Hershcovich
Qiaoling Xiao
137
0
0
21 Nov 2025
From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems
From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems
Brendan Gho
Suman Muppavarapu
Afnan Shaik
Tyson Tsay
James Begin
Kevin Zhu
Archana Vaidheeswaran
Vasu Sharma
LLMAG
173
0
0
18 Nov 2025
From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation
From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation
Niranjan Chebrolu
Gerard Christopher Yeo
Kokil Jaidka
LLMSV
210
0
0
16 Nov 2025
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
Davi Bastos Costa
Felippe Alves
Renato Vicente
137
0
0
11 Nov 2025
Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving
Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving
Hui Zeng
Daming Zhao
Pengfei Yang
WenXuan Hou
Tianyang Zheng
Hui Li
Weiye Ji
Jidong Zhai
221
1
0
08 Nov 2025
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
Raghav Sharma
Manan Mehta
Sai Tiger Raina
311
0
0
06 Nov 2025
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
Shahriyar Zaman Ridoy
Azmine Toushik Wasi
Koushik Ahamed Tonmoy
LRM
173
0
0
05 Nov 2025
Deep Value Benchmark: Measuring Whether Models Generalize Deep Values or Shallow Preferences
Deep Value Benchmark: Measuring Whether Models Generalize Deep Values or Shallow Preferences
Joshua Ashkinaze
Hua Shen
Sai Avula
Eric Gilbert
Ceren Budak
VLM
291
0
0
03 Nov 2025
Diverse Human Value Alignment for Large Language Models via Ethical Reasoning
Diverse Human Value Alignment for Large Language Models via Ethical Reasoning
Jiahao Wang
Songkai Xue
Jinghui Li
X. Wang
123
0
0
01 Nov 2025
Debiasing Reward Models by Representation Learning with Guarantees
Debiasing Reward Models by Representation Learning with Guarantees
Ignavier Ng
Patrick Blobaum
Siddharth Bhandari
Kun Zhang
Shiva Prasad Kasiviswanathan
139
1
0
27 Oct 2025
Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining
Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining
Xiaofan Zhou
Lu Cheng
CLL
384
0
0
27 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
149
0
0
24 Oct 2025
Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models
Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models
Hanze Guo
Jing Yao
Xiao Zhou
Xiaoyuan Yi
Xing Xie
153
0
0
21 Oct 2025
Mapping Post-Training Forgetting in Language Models at Scale
Mapping Post-Training Forgetting in Language Models at Scale
Jackson Harmon
Andreas Hochlehnert
Matthias Bethge
Ameya Prabhu
CLLKELM
153
0
0
20 Oct 2025
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Yu Ying Chiu
Michael S. Lee
Rachel Calcott
Brandon Handoko
Paul de Font-Reaulx
...
Mantas Mazeika
Bing Liu
Yejin Choi
Mitchell L. Gordon
Sydney Levine
ELMLRM
129
0
0
18 Oct 2025
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
Lina Berrayana
Ahmed Heakl
Muhammad Abdullah Sohail
Thomas Hofmann
Salman Khan
Wei Chen
180
1
0
17 Oct 2025
RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
Zhichao Wang
Andy Wong
Ruslan Belkin
ALMLRM
115
0
0
16 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
122
0
0
15 Oct 2025
Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification
Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification
Mahamodul Hasan Mahadi
Md. Nasif Safwan
Souhardo Rahman
Shahnaj Parvin
Aminun Nahar
Kamruddin Nur
VLM
97
0
0
14 Oct 2025
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
Nicole Smith-Vaniz
Harper Lyon
Lorraine Steigner
Ben Armstrong
Nicholas Mattei
123
0
0
14 Oct 2025
Deliberative Dynamics and Value Alignment in LLM Debates
Deliberative Dynamics and Value Alignment in LLM Debates
Pratik S. Sachdeva
Tom van Nuenen
135
0
0
11 Oct 2025
VideoNorms: Benchmarking Cultural Awareness of Video Language Models
VideoNorms: Benchmarking Cultural Awareness of Video Language Models
Nikhil Reddy Varimalla
Yunfei Xu
Arkadiy Saakyan
Meng Fan Wang
Smaranda Muresan
VGenVLM
193
0
0
09 Oct 2025
Reasoning for Hierarchical Text Classification: The Case of Patents
Reasoning for Hierarchical Text Classification: The Case of Patents
Lekang Jiang
Wenjun Sun
Stephan Goetz
BDL
147
7
0
08 Oct 2025
ParsTranslit: Truly Versatile Tajik-Farsi Transliteration
ParsTranslit: Truly Versatile Tajik-Farsi Transliteration
Rayyan Merchant
Kevin Tang
90
0
0
08 Oct 2025
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Lawrence Liu
Alexander Liu
Mengdi Wang
T. Zhao
Lin F. Yang
121
0
0
07 Oct 2025
EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences
EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences
Kshitish Ghate
Andy Liu
Devansh Jain
Taylor Sorensen
Atoosa Kasirzadeh
Aylin Caliskan
Mona Diab
Maarten Sap
LLMSV
312
0
0
07 Oct 2025
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Lulu Gong
Shreya Saxena
149
0
0
07 Oct 2025
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Xingjian Zhao
Zhe Xu
Qinyuan Cheng
Zhaoye Fei
Luozhijie Jin
...
Yitian Gong
Yuanfan Xu
Yaqian Zhou
Xuanjing Huang
Xipeng Qiu
AuLLM
268
2
0
01 Oct 2025
Visual Self-Refinement for Autoregressive Models
Visual Self-Refinement for Autoregressive Models
Jiamian Wang
Ziqi Zhou
Chaithanya Kumar Mummadi
S. Dianat
Majid Rabbani
Raghuveer Rao
Chen Qiu
Zhiqiang Tao
105
0
0
01 Oct 2025
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
P. Migliarini
Mashal Afzal Memon
Marco Autili
P. Inverardi
LRM
63
0
0
01 Oct 2025
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
Adi Simhi
Jonathan Herzig
Martin Tutek
Itay Itzhak
Idan Szpektor
Yonatan Belinkov
LLMAG
101
0
0
01 Oct 2025
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
Jisu Shin
Hoyun Song
Juhyun Oh
Changgeon Ko
Eunsu Kim
Chani Jung
Alice Oh
169
0
0
30 Sep 2025
TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
Seohyun Lee
Wenzhi Fang
Dong-Jun Han
Seyyedali Hosseinalipour
Christopher G. Brinton
116
0
0
30 Sep 2025
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Nigel Fernandez
Branislav Kveton
Ryan Rossi
Andrew Lan
Zichao Wang
LRM
216
0
0
29 Sep 2025
Generative Value Conflicts Reveal LLM Priorities
Generative Value Conflicts Reveal LLM Priorities
Andy Liu
Kshitish Ghate
Mona Diab
Daniel Fried
Atoosa Kasirzadeh
Max Kleiman-Weiner
148
2
0
29 Sep 2025
SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
Xinye Zhao
Spyridon Mastorakis
138
2
0
29 Sep 2025
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
Zhixin Zhang
Zeming Wei
Meng Sun
CLL
148
0
0
28 Sep 2025
One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
Sualeha Farid
Jayden Lin
Zean Chen
Shivani Kumar
David Jurgens
LRM
140
1
0
25 Sep 2025
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
Sasha Cui
Zhongren Chen
LLMSV
238
1
0
25 Sep 2025
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Stephen Fitz
P. Romero
Steven Basart
Sipeng Chen
Jose Hernandez-Orallo
136
1
0
19 Sep 2025
Emergent Alignment via Competition
Emergent Alignment via Competition
Natalie Collina
Surbhi Goel
Aaron Roth
Emily Ryu
Mirah Shi
102
2
0
18 Sep 2025
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
Angelina Wang
Mark A. Lemley
Sanmi Koyejo
OffRL
190
2
0
18 Sep 2025
Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs
Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs
Yang Liu
Chenhui Chu
153
0
0
17 Sep 2025
MillStone: How Open-Minded Are LLMs?
MillStone: How Open-Minded Are LLMs?
Harold Triedman
Vitaly Shmatikov
220
0
0
15 Sep 2025
1234...8910
Next