The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs

6 February 2025

Abstract

As large language models (LLMs) become integral to diverse applications, ensuring their reliability under varying input conditions is crucial. One key issue affecting this reliability is order sensitivity, wherein slight variations in the input arrangement can lead to inconsistent or biased outputs. Although recent advances have reduced this sensitivity, the problem remains unresolved. This paper investigates the extent of order sensitivity in LLMs whose internal components are hidden from users (such as closed-source models or those accessed via API calls). We conduct experiments across multiple tasks, including paraphrasing, relevance judgment, and multiple-choice questions. Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy. Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation; however, fails to fully resolve the problem. These findings highlight persistent risks, particularly in high-stakes applications, and point to the need for more robust LLMs or improved input-handling techniques in future development.

View on arXiv

@article{guan2025_2502.04134,
  title={ The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs },
  author={ Bryan Guan and Tanya Roosta and Peyman Passban and Mehdi Rezagholizadeh },
  journal={arXiv preprint arXiv:2502.04134},
  year={ 2025 }
}

Comments on this paper