124

Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs

Main:8 Pages
7 Figures
Bibliography:4 Pages
3 Tables
Appendix:2 Pages
Abstract

Large Language Models (LLMs) often produce inconsistent answers when faced with different phrasings of the same prompt. In this paper, we propose Flip-Flop Consistency (F2CF^2C), an unsupervised training method that improves robustness to such perturbations. F2CF^2C is composed of two key components. The first, Consensus Cross-Entropy (CCE), uses a majority vote across prompt variations to create a hard pseudo-label. The second is a representation alignment loss that pulls lower-confidence and non-majority predictors toward the consensus established by high-confidence, majority-voting variations. We evaluate our method on 11 datasets spanning four NLP tasks, with 4-15 prompt variations per dataset. On average, F2CF^2C raises observed agreement by 11.62%, improves mean F1F_1 by 8.94%, and reduces performance variance across formats by 3.29%. In out-of-domain evaluations, F2CF^2C generalizes effectively, increasing F1\overline{F_1} and agreement while decreasing variance across most source-target pairs. Finally, when trained on only a subset of prompt perturbations and evaluated on held-out formats, F2CF^2C consistently improves both performance and agreement while reducing variance. These findings highlight F2CF^2C as an effective unsupervised method for enhancing LLM consistency, performance, and generalization under prompt perturbations. Code is available atthis https URL.

View on arXiv
Comments on this paper