274
v1v2 (latest)

Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations

Journal for Language Technology and Computational Linguistics (JLCL), 2024
Main:11 Pages
Bibliography:2 Pages
2 Tables
Appendix:1 Pages
Abstract

Contemporary research in social sciences increasingly utilizes state-of-the-art generative language models to annotate or generate content. While these models achieve benchmark-leading performance on common language tasks, their application to novel out-of-domain tasks remains insufficiently explored. To address this gap, we investigate how personalized language models align with human responses on the Moral Foundation Theory Questionnaire. We adapt open-source generative language models to different political personas and repeatedly survey these models to generate synthetic data sets where model-persona combinations define our sub-populations. Our analysis reveals that models produce inconsistent results across multiple repetitions, yielding high response variance. Furthermore, the alignment between synthetic data and corresponding human data from psychological studies shows a weak correlation, with conservative persona-prompted models particularly failing to align with actual conservative populations. These results suggest that language models struggle to coherently represent ideologies through in-context prompting due to their alignment process. Thus, using language models to simulate social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes properly.

View on arXiv
Comments on this paper