ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15851
31
2

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

21 February 2025
Yilin Geng
H. Li
Honglin Mu
Xudong Han
Timothy Baldwin
Omri Abend
Eduard H. Hovy
Lea Frermann
ArXivPDFHTML
Abstract

Large language models (LLMs) are increasingly deployed with hierarchical instruction schemes, where certain instructions (e.g., system-level directives) are expected to take precedence over others (e.g., user messages). Yet, we lack a systematic understanding of how effectively these hierarchical control mechanisms work. We introduce a systematic evaluation framework based on constraint prioritization to assess how well LLMs enforce instruction hierarchies. Our experiments across six state-of-the-art LLMs reveal that models struggle with consistent instruction prioritization, even for simple formatting conflicts. We find that the widely-adopted system/user prompt separation fails to establish a reliable instruction hierarchy, and models exhibit strong inherent biases toward certain constraint types regardless of their priority designation. While controlled prompt engineering and model fine-tuning show modest improvements, our results indicate that instruction hierarchy enforcement is not robustly realized, calling for deeper architectural innovations beyond surface-level modifications.

View on arXiv
@article{geng2025_2502.15851,
  title={ Control Illusion: The Failure of Instruction Hierarchies in Large Language Models },
  author={ Yilin Geng and Haonan Li and Honglin Mu and Xudong Han and Timothy Baldwin and Omri Abend and Eduard Hovy and Lea Frermann },
  journal={arXiv preprint arXiv:2502.15851},
  year={ 2025 }
}
Comments on this paper