16
4

Exploring the Personality Traits of LLMs through Latent Features Steering

Abstract

Large language models (LLMs) have significantly advanced dialogue systems and role-playing agents through their ability to generate human-like text. While prior studies have shown that LLMs can exhibit distinct and consistent personalities, the mechanisms through which these models encode and express specific personality traits remain poorly understood. To address this, we investigate how various factors, such as cultural norms and environmental stressors, encoded within LLMs, shape their personality traits, guided by the theoretical framework of social determinism. Inspired by related work on LLM interpretability, we propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model, thereby eliminating the need for retraining. Furthermore, we analyze the implications of these factors for model safety, focusing on their impact through the lens of personality.

View on arXiv
@article{yang2025_2410.10863,
  title={ Exploring the Personality Traits of LLMs through Latent Features Steering },
  author={ Shu Yang and Shenzhe Zhu and Liang Liu and Lijie Hu and Mengdi Li and Di Wang },
  journal={arXiv preprint arXiv:2410.10863},
  year={ 2025 }
}
Comments on this paper