Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023

Yejin Choi

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

49 / 49 papers shown

Title
ConSens: Assessing context grounding in open-book question answering Ivan Vankov Matyo Ivanov Adriana Correia Victor Botev ELM 61 0 0 30 Apr 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Sophia Hager David Mueller Kevin Duh Nicholas Andrews 65 0 0 18 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy Ruixi Lin Ziqiao Wang Yang You FaML 76 0 0 07 Mar 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising Hongye Jin Pei Chen Jingfeng Yang Z. Wang Meng-Long Jiang ... X. Zhang Zheng Li Tianyi Liu Huasheng Li Bing Yin 60 0 0 26 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation Saurabh Kumar Pandey S. Vashistha Debrup Das Somak Aditya Monojit Choudhury AAML 69 0 0 10 Feb 2025
Benchmarking Prompt Sensitivity in Large Language Models Amirhossein Razavi Mina Soltangheis Negar Arabzadeh Sara Salamat Morteza Zihayat Ebrahim Bagheri 59 1 0 09 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs Bryan Guan Tanya Roosta Peyman Passban Mehdi Rezagholizadeh 94 0 0 06 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu Jiahang Xu Li Lyna Zhang Qi Chen Xuan Feng Yang Chen Zhongxin Guo Yuqing Yang Cheng Peng 77 2 0 06 Feb 2025
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation Leonidas Zotos H. Rijn Malvina Nissim 65 0 0 16 Dec 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 89 1 0 19 Nov 2024
Vulnerability of LLMs to Vertically Aligned Text Manipulations Zhecheng Li Y. Wang Bryan Hooi Yujun Cai Zhen Xiong Nanyun Peng Kai-Wei Chang 49 1 0 26 Oct 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao Dokyun Lee Gordon Burtch Sina Fazelpour LRM 40 7 0 25 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) Leander Girrbach Yiran Huang Stephan Alaniz Trevor Darrell Zeynep Akata VLM 40 2 0 25 Oct 2024
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data Wenkai Li Jiarui Liu Andy Liu Xuhui Zhou Mona Diab Maarten Sap 44 5 0 21 Oct 2024
Do LLMs "know" internally when they follow instructions? Juyeon Heo Christina Heinze-Deml Oussama Elachqar Shirley Ren Udhay Nallasamy Andy Miller Kwan Ho Ryan Chan Jaya Narain 44 3 0 18 Oct 2024
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction Yuanchao Li Yuan Gong Chao-Han Huck Yang P. Bell Catherine Lai 45 1 0 23 Sep 2024
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time David Herel Vojtech Bartek Jiri Jirak Tomáš Mikolov 42 2 0 20 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination Eva Sánchez Salido Roser Morante Julio Gonzalo Guillermo Marco Jorge Carrillo-de-Albornoz ... Enrique Amigó Andrés Fernández Alejandro Benito-Santos Adrián Ghajari Espinosa Victor Fresno ELM 39 0 0 19 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers Alexander Wuttke Matthias Aßenmacher Christopher Klamm Max M. Lang Quirin Würschinger Frauke Kreuter 34 2 0 16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Sacha Muller António Loison Bilel Omrani Gautier Viaud RALM ELM 26 1 0 10 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić Tijana Zrnic Cinoo Lee Emmanuel J. Candès Dan Jurafsky 61 4 0 27 Aug 2024
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models Kaushal Kumar Maurya KV Aditya Srivatsa Ekaterina Kochmar 36 2 0 16 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models Zi Liang Haibo Hu Qingqing Ye Yaxin Xiao Haoyang Li AAML ELM SILM 43 4 0 05 Aug 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? Leonidas Zotos H. Rijn Malvina Nissim ELM 29 2 0 07 Jul 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt Deng Cai Huayang Li Tingchen Fu Siheng Li Weiwen Xu ... Leyang Cui Yan Wang Lemao Liu Taro Watanabe Shuming Shi KELM 26 2 0 24 Jun 2024
Large Language Models Assume People are More Rational than We Really are Ryan Liu Jiayi Geng Joshua C. Peterson Ilia Sucholutsky Thomas L. Griffiths 60 16 0 24 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks Gili Lior Avi Caciularu Arie Cattan Shahar Levy Ori Shapira Gabriel Stanovsky RALM 33 4 0 23 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Seungbeen Lee Seungwon Lim Seungju Han Giyeong Oh Hyungjoo Chae ... Beong-woo Kwak Yeonsoo Lee Dongha Lee Jinyoung Yeo Youngjae Yu 31 8 0 20 Jun 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering Federico Errica G. Siracusano D. Sanvito Roberto Bifulco 67 19 0 18 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 49 12 0 13 Jun 2024
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs Mert Yazan Suzan Verberne F. Situmeang MQ 26 3 0 10 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits Tim Franzmeyer Aleksandar Shtedritski Samuel Albanie Philip H. S. Torr João F. Henriques Jakob N. Foerster 17 1 0 05 Jun 2024
Efficient multi-prompt evaluation of LLMs Felipe Maia Polo Ronald Xu Lucas Weber Mírian Silva Onkar Bhardwaj Leshem Choshen Allysson Flavio Melo de Oliveira Yuekai Sun Mikhail Yurochkin 34 17 0 27 May 2024
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents Guangzhi Sun Xiao Zhan Jose Such 22 22 0 26 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Christopher Rawles Sarah Clinckemaillie Yifan Chang Jonathan Waltz Gabrielle Lau ... Daniel Toyama Robert Berry Divya Tyamagundlu Timothy Lillicrap Oriana Riva LLMAG 57 44 0 23 May 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 42 7 0 09 May 2024
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents Sneha Singhania Simon Razniewski G. Weikum RALM 34 1 0 04 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration Amanda Bertsch Maor Ivgi Uri Alon Jonathan Berant Matthew R. Gormley Matthew R. Gormley Graham Neubig ReLM AIMat 81 64 0 30 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs Valeriia Cherepanova James Zou AAML 28 4 0 26 Apr 2024
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks Melissa Ailem Katerina Marazopoulou Charlotte Siska James Bono 51 13 0 25 Apr 2024
Designing Informative Metrics for Few-Shot Example Selection Rishabh Adiga Lakshminarayanan Subramanian Varun Chandrasekaran 24 1 0 06 Mar 2024
Large Language Models Can Better Understand Knowledge Graphs Than We Thought Xinbang Dai Yuncheng Hua Tongtong Wu Yang Sheng Qiu Ji Guilin Qi 72 0 0 18 Feb 2024
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance A. Salinas Fred Morstatter 30 49 0 08 Jan 2024
Instruction Induction: From Few Examples to Natural Language Task Descriptions Or Honovich Uri Shaham Samuel R. Bowman Omer Levy ELM LRM 107 133 0 22 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 274 1,114 0 18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 278 3,784 0 18 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 257 374 0 28 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 258 343 0 01 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners Tianyu Gao Adam Fisch Danqi Chen 241 1,898 0 31 Dec 2020