Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2008.02275
Cited By

Aligning AI With Shared Human Values

v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Jacob Steinhardt

ArXiv (abs)PDF HTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

MALLM: Multi-Agent Large Language Models Framework

MALLM: Multi-Agent Large Language Models Framework

Lars Benedikt Kaesberg

Jan Philip Wahle

236

2

0

15 Sep 2025

MillStone: How Open-Minded Are LLMs?

MillStone: How Open-Minded Are LLMs?

Harold Triedman

Vitaly Shmatikov

220

0

0

15 Sep 2025

MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables

MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables

Matteo Marcuzzo

Jose Camacho-Collados

Mohammad Taher Pilehvar

212

3

0

15 Sep 2025

CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI

CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI

Hasin Jawad Ali

Md. Kamrul Hasan

88

0

0

14 Sep 2025

Murphys Laws of AI Alignment: Why the Gap Always Wins

Murphys Laws of AI Alignment: Why the Gap Always Wins

Madhava Gaikwad

268

1

0

04 Sep 2025

SoK: Large Language Model Copyright Auditing via Fingerprinting

SoK: Large Language Model Copyright Auditing via Fingerprinting

355

4

0

27 Aug 2025

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

...

222

0

0

26 Aug 2025

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

...

181

4

0

26 Aug 2025

Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?

Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?

Hyeong Kyu Choi

345

10

0

24 Aug 2025

Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens

Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens

Ilias Chalkidis

152

1

0

23 Aug 2025

Political Ideology Shifts in Large Language Models

Political Ideology Shifts in Large Language Models

Pietro Bernardelle

Stefano Civelli

Riccardo Lunardi

Gianluca Demartini

112

1

0

22 Aug 2025

Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

Alessio Galatolo

Luca Alberto Rappuoli

Meriem Beloucif

139

1

0

18 Aug 2025

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position

186

5

0

17 Aug 2025

The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases

The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases

Emanuel Z. Fenech-Borg

Tilen P. Meznaric-Kos

Milica D. Lekovic-Bojovic

Arni J. Hentze-Djurhuus

252

0

0

17 Aug 2025

Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions

Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions

Christopher J. Agostino

56

0

0

15 Aug 2025

Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models

Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models

Monika Jotautaitė

David A. Brewster

Thilo Hagendorff

148

0

0

15 Aug 2025

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

208

0

0

13 Aug 2025

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

226

4

0

13 Aug 2025

VGGSounder: Audio-Visual Evaluations for Foundation Models

VGGSounder: Audio-Visual Evaluations for Foundation Models

Thaddäus Wiedemer

Christian Schroeder de Witt

Matthias Bethge

Wieland Brendel

A. Sophia Koepke

235

4

0

11 Aug 2025

Sotopia-RL: Reward Design for Social Intelligence

Sotopia-RL: Reward Design for Social Intelligence

Kolby Nottingham

Bodhisattwa Prasad Majumder

217

5

0

05 Aug 2025

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

161

2

0

03 Aug 2025

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

...

195

5

0

01 Aug 2025

Model Misalignment and Language Change: Traces of AI-Associated Language in Unscripted Spoken English

Model Misalignment and Language Change: Traces of AI-Associated Language in Unscripted Spoken English

232

2

0

01 Aug 2025

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

123

2

0

30 Jul 2025

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

Amir D. N. Cohen

Shauli Ravfogel

137

0

0

27 Jul 2025

Diversity-Enhanced Reasoning for Subjective Questions

Diversity-Enhanced Reasoning for Subjective Questions

487

6

0

27 Jul 2025

Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics

Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics

149

3

0

25 Jul 2025

The Geometry of Harmfulness in LLMs through Subconcept Probing

The Geometry of Harmfulness in LLMs through Subconcept Probing

Saleena Angeline

Adhitya Rajendra Kumar

221

3

0

23 Jul 2025

Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems

Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems

181

2

0

07 Jul 2025

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

246

0

0

06 Jul 2025

Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm

Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm

280

0

0

25 Jun 2025

Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning

Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning

176

1

0

19 Jun 2025

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Gjergji Kasneci

264

1

0

17 Jun 2025

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

384

1

0

17 Jun 2025

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

Yusuke Yamauchi

258

5

0

17 Jun 2025

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Secil Yanik Guyot

Aaron J. Snoswell

296

3

0

16 Jun 2025

Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives

Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives

...

434

0

0

11 Jun 2025

MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory

MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory

Ana Carolina Condez

João Magalhães

225

0

0

06 Jun 2025

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

369

2

0

05 Jun 2025

Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning

Hsiao-Ying Huang

348

4

0

05 Jun 2025

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

Stephen Obadinma

Radin Shayanfar

261

3

0

04 Jun 2025

GEM: Empowering LLM for both Embedding Generation and Language Understanding

Sai Vidyaranya Nuthalapati

171

3

0

04 Jun 2025

VM14K: First Vietnamese Medical Benchmark

VM14K: First Vietnamese Medical Benchmark

211

0

0

02 Jun 2025

Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models

Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

208

0

0

01 Jun 2025

Large Language Models Often Know When They Are Being Evaluated

Large Language Models Often Know When They Are Being Evaluated

Henning Bartsch

Marius Hobbhahn

351

23

0

28 May 2025

Advancing Expert Specialization for Better MoE

Advancing Expert Specialization for Better MoE

358

7

0

28 May 2025

Are Language Models Consequentialist or Deontological Moral Reasoners?

Are Language Models Consequentialist or Deontological Moral Reasoners?

Max Kleiman-Weiner

David Guzman Piedrahita

Amélie Reymond

Bernhard Schölkopf

198

4

0

27 May 2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models

Kristina Lerman

328

3

0

27 May 2025

Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models

Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

242

0

0

27 May 2025

Efficient Data Selection at Scale via Influence Distillation

Efficient Data Selection at Scale via Influence Distillation

Vincent Cohen-Addad

329

4

0

25 May 2025

1 2 3 4 5...8 9 10