Stress-Testing Model Specs Reveals Character Differences among Language Models

9 October 2025

Jifan Zhang

Main:15 Pages

14 Figures

Bibliography:3 Pages

3 Tables

Appendix:9 Pages

Abstract

Large language models (LLMs) are increasingly trained from AI constitutions and model specifications that establish behavioral guidelines and ethical principles. However, these specifications face critical challenges, including internal conflicts between principles and insufficient coverage of nuanced scenarios. We present a systematic methodology for stress-testing model character specifications, automatically identifying numerous cases of principle contradictions and interpretive ambiguities in current model specs.

View on arXiv

Comments on this paper