v1v2v3 (latest)

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

13 February 2024

Papers citing "Measuring and Controlling Instruction (In)Stability in Language Model Dialogs"

15 / 15 papers shown

Persistent Instability in LLM's Personality Measurements: Effects of Scale, Reasoning, and Conversation History

Tommaso Tosato

Saskia Helbling

Yorguin-Jose Mantilla-Ramos

118

24 Dec 2025

Drift No More? Context Equilibria in Multi-Turn LLM Interactions

121

09 Oct 2025

When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following

Keno Harada

Yudai Yamazaki

Masachika Taniguchi

Edison Marrese-Taylor

147

25 Sep 2025

Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models

Jose Hernandez-Orallo

136

19 Sep 2025

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

173

12 Aug 2025

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

235

05 Aug 2025

Goal Alignment in LLM-Based User Simulators for Conversational AI

148

27 Jul 2025

When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Daniela Occhipinti

Marco Guerini

Malvina Nissim

257

30 May 2025

Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)Conference on Fairness, Accountability and Transparency (FAccT), 2025

346

27 May 2025

Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target AtomsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

445

23 May 2025

Exploiting Fine-Grained Skip Behaviors for Micro-Video RecommendationAAAI Conference on Artificial Intelligence (AAAI), 2025

Sanghyuck Lee

Sangkeun Park

Jaesung Lee

258

04 Apr 2025

Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts

326

30 Mar 2025

Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents

496

27 Feb 2025

An Auditing Test To Detect Behavioral Shift in Language ModelsInternational Conference on Learning Representations (ICLR), 2024

443

25 Oct 2024

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Tong Chen

Akari Asai

Niloofar Mireshghallah

Sewon Min

Hannaneh Hajishirzi

285

09 Jul 2024