From Stability to Inconsistency: A Study of Moral Preferences in LLMs

8 April 2025

Abstract

As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas. Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.

View on arXiv

@article{jotautaite2025_2504.06324,
  title={ From Stability to Inconsistency: A Study of Moral Preferences in LLMs },
  author={ Monika Jotautaite and Mary Phuong and Chatrik Singh Mangat and Maria Angelica Martinez },
  journal={arXiv preprint arXiv:2504.06324},
  year={ 2025 }
}

Comments on this paper