Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.11324
Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"
49 / 49 papers shown
Title
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
61
0
0
30 Apr 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Sophia Hager
David Mueller
Kevin Duh
Nicholas Andrews
65
0
0
18 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin
Ziqiao Wang
Yang You
FaML
76
0
0
07 Mar 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising
Hongye Jin
Pei Chen
Jingfeng Yang
Z. Wang
Meng-Long Jiang
...
X. Zhang
Zheng Li
Tianyi Liu
Huasheng Li
Bing Yin
60
0
0
26 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Saurabh Kumar Pandey
S. Vashistha
Debrup Das
Somak Aditya
Monojit Choudhury
AAML
69
0
0
10 Feb 2025
Benchmarking Prompt Sensitivity in Large Language Models
Amirhossein Razavi
Mina Soltangheis
Negar Arabzadeh
Sara Salamat
Morteza Zihayat
Ebrahim Bagheri
59
1
0
09 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
94
0
0
06 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Lyna Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
77
2
0
06 Feb 2025
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
Leonidas Zotos
H. Rijn
Malvina Nissim
65
0
0
16 Dec 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
89
1
0
19 Nov 2024
Vulnerability of LLMs to Vertically Aligned Text Manipulations
Zhecheng Li
Y. Wang
Bryan Hooi
Yujun Cai
Zhen Xiong
Nanyun Peng
Kai-Wei Chang
49
1
0
26 Oct 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
40
7
0
25 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
40
2
0
25 Oct 2024
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li
Jiarui Liu
Andy Liu
Xuhui Zhou
Mona Diab
Maarten Sap
44
5
0
21 Oct 2024
Do LLMs "know" internally when they follow instructions?
Juyeon Heo
Christina Heinze-Deml
Oussama Elachqar
Shirley Ren
Udhay Nallasamy
Andy Miller
Kwan Ho Ryan Chan
Jaya Narain
44
3
0
18 Oct 2024
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction
Yuanchao Li
Yuan Gong
Chao-Han Huck Yang
P. Bell
Catherine Lai
45
1
0
23 Sep 2024
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
David Herel
Vojtech Bartek
Jiri Jirak
Tomáš Mikolov
42
2
0
20 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
39
0
0
19 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
Alexander Wuttke
Matthias Aßenmacher
Christopher Klamm
Max M. Lang
Quirin Würschinger
Frauke Kreuter
34
2
0
16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
26
1
0
10 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
61
4
0
27 Aug 2024
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
Kaushal Kumar Maurya
KV Aditya Srivatsa
Ekaterina Kochmar
36
2
0
16 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
43
4
0
05 Aug 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Leonidas Zotos
H. Rijn
Malvina Nissim
ELM
29
2
0
07 Jul 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
26
2
0
24 Jun 2024
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas L. Griffiths
60
16
0
24 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
33
4
0
23 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
31
8
0
20 Jun 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
67
19
0
18 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
49
12
0
13 Jun 2024
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs
Mert Yazan
Suzan Verberne
F. Situmeang
MQ
26
3
0
10 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip H. S. Torr
João F. Henriques
Jakob N. Foerster
17
1
0
05 Jun 2024
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
34
17
0
27 May 2024
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents
Guangzhi Sun
Xiao Zhan
Jose Such
22
22
0
26 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
57
44
0
23 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
42
7
0
09 May 2024
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Sneha Singhania
Simon Razniewski
G. Weikum
RALM
34
1
0
04 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLM
AIMat
81
64
0
30 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
28
4
0
26 Apr 2024
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
51
13
0
25 Apr 2024
Designing Informative Metrics for Few-Shot Example Selection
Rishabh Adiga
Lakshminarayanan Subramanian
Varun Chandrasekaran
24
1
0
06 Mar 2024
Large Language Models Can Better Understand Knowledge Graphs Than We Thought
Xinbang Dai
Yuncheng Hua
Tongtong Wu
Yang Sheng
Qiu Ji
Guilin Qi
72
0
0
18 Feb 2024
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
30
49
0
08 Jan 2024
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
107
133
0
22 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
1,114
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
257
374
0
28 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
258
343
0
01 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,898
0
31 Dec 2020
1