Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2008.02275
Cited By

Aligning AI With Shared Human Values

v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Jacob Steinhardt

ArXiv (abs)PDF HTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions

Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions

Kazi Abrab Hossain

Jannatul Somiya Mahmud

Maria Hossain Tuli

S. M. Taiabul Haque

Farig Y. Sadeque

121

0

0

03 Dec 2025

Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants

Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants

270

0

0

28 Nov 2025

Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs

Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs

Daniel Agyei Asante

Md Mokarram Chowdhury

90

0

0

27 Nov 2025

Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges

Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges

148

0

0

27 Nov 2025

FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models

FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models

Husrev Taha Sencar

114

0

0

24 Nov 2025

PoETa v2: Toward More Robust Evaluation of Large Language Models in Portuguese

PoETa v2: Toward More Robust Evaluation of Large Language Models in PortugueseIEEE Access (IEEE Access), 2025

Thales Sales Almeida

Hugo Queiroz Abonizio

Rodrigo Nogueira

78

1

0

21 Nov 2025

Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis

Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis

Daniel Hershcovich

151

0

0

21 Nov 2025

From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

Suman Muppavarapu

Archana Vaidheeswaran

187

0

0

18 Nov 2025

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

Niranjan Chebrolu

Gerard Christopher Yeo

219

0

0

16 Nov 2025

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Davi Bastos Costa

147

0

0

11 Nov 2025

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

228

1

0

08 Nov 2025

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Sai Tiger Raina

312

0

0

06 Nov 2025

BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture

BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture

Shahriyar Zaman Ridoy

Azmine Toushik Wasi

Koushik Ahamed Tonmoy

177

0

0

05 Nov 2025

Deep Value Benchmark: Measuring Whether Models Generalize Deep Values or Shallow Preferences

Deep Value Benchmark: Measuring Whether Models Generalize Deep Values or Shallow Preferences

Joshua Ashkinaze

295

0

0

03 Nov 2025

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

127

0

0

01 Nov 2025

Debiasing Reward Models by Representation Learning with Guarantees

Debiasing Reward Models by Representation Learning with Guarantees

Patrick Blobaum

Siddharth Bhandari

Shiva Prasad Kasiviswanathan

140

1

0

27 Oct 2025

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

399

0

0

27 Oct 2025

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

Armstrong Foundjem

Aishwarya Ramasethu

...

149

0

0

24 Oct 2025

Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models

Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models

170

0

0

21 Oct 2025

Mapping Post-Training Forgetting in Language Models at Scale

Mapping Post-Training Forgetting in Language Models at Scale

Andreas Hochlehnert

Matthias Bethge

160

0

0

20 Oct 2025

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

Brandon Handoko

Paul de Font-Reaulx

...

Mitchell L. Gordon

141

0

0

18 Oct 2025

Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning

Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning

Muhammad Abdullah Sohail

185

1

0

17 Oct 2025

RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following

RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following

116

0

0

16 Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

Selective Adversarial Attacks on LLM Benchmarks

Anastasia Orlova

122

0

0

15 Oct 2025

Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification

Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification

Mahamodul Hasan Mahadi

Md. Nasif Safwan

Souhardo Rahman

100

0

0

14 Oct 2025

Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory

Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory

Nicole Smith-Vaniz

Lorraine Steigner

Nicholas Mattei

124

1

0

14 Oct 2025

Deliberative Dynamics and Value Alignment in LLM Debates

Deliberative Dynamics and Value Alignment in LLM Debates

Pratik S. Sachdeva

149

0

0

11 Oct 2025

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

Nikhil Reddy Varimalla

Arkadiy Saakyan

Smaranda Muresan

197

0

0

09 Oct 2025

Reasoning for Hierarchical Text Classification: The Case of Patents

Reasoning for Hierarchical Text Classification: The Case of Patents

155

8

0

08 Oct 2025

ParsTranslit: Truly Versatile Tajik-Farsi Transliteration

ParsTranslit: Truly Versatile Tajik-Farsi Transliteration

Rayyan Merchant

93

0

0

08 Oct 2025

ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization

ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization

134

0

0

07 Oct 2025

EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences

EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences

Taylor Sorensen

Atoosa Kasirzadeh

314

0

0

07 Oct 2025

Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method

Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method

149

0

0

07 Oct 2025

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

...

273

2

0

01 Oct 2025

Visual Self-Refinement for Autoregressive Models

Visual Self-Refinement for Autoregressive Models

Chaithanya Kumar Mummadi

105

0

0

01 Oct 2025

Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning

Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning

Mashal Afzal Memon

72

0

0

01 Oct 2025

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Jonathan Herzig

Yonatan Belinkov

108

0

0

01 Oct 2025

RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

178

0

0

30 Sep 2025

TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning

TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning

Seyyedali Hosseinalipour

Christopher G. Brinton

118

0

0

30 Sep 2025

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

Nigel Fernandez

Branislav Kveton

221

0

0

29 Sep 2025

Generative Value Conflicts Reveal LLM Priorities

Generative Value Conflicts Reveal LLM Priorities

Atoosa Kasirzadeh

Max Kleiman-Weiner

151

2

0

29 Sep 2025

SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching

SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching

Spyridon Mastorakis

152

3

0

29 Sep 2025

Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings

Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings

152

0

0

28 Sep 2025

One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning

One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning

151

1

0

25 Sep 2025

Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models

Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models

248

1

0

25 Sep 2025

Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models

Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models

Jose Hernandez-Orallo

138

1

0

19 Sep 2025

Emergent Alignment via Competition

Emergent Alignment via Competition

Natalie Collina

109

2

0

18 Sep 2025

The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior

The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior

199

5

0

18 Sep 2025

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

167

0

0

17 Sep 2025

MillStone: How Open-Minded Are LLMs?

MillStone: How Open-Minded Are LLMs?

Harold Triedman

Vitaly Shmatikov

229

0

0

15 Sep 2025

1 2 3 4...8 9 10

Page 1 of 10