ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12334
  4. Cited By
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering

What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering

18 June 2024
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
ArXivPDFHTML

Papers citing "What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering"

16 / 16 papers shown
Title
Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text
Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text
Jennifer Healey
Laurie Byrum
Md Nadeem Akhtar
Surabhi Bhargava
Moumita Sinha
25
0
0
05 May 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
21
0
0
29 Apr 2025
LLMs as Data Annotators: How Close Are We to Human Performance
LLMs as Data Annotators: How Close Are We to Human Performance
Muhammad Uzair Ul Haq
Davide Rigoni
A. Sperduti
17
0
0
21 Apr 2025
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Jihyun Janice Ahn
Wenpeng Yin
SILM
LRM
53
1
0
02 Apr 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Yubo Li
Yidi Miao
Xueying Ding
Ramayya Krishnan
R. Padman
32
0
0
28 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng
Yihang Sun
Jiaxuan You
43
0
0
16 Mar 2025
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
Maximilian Spliethover
Tim Knebler
Fabian Fumagalli
Maximilian Muschalik
Barbara Hammer
Eyke Hüllermeier
Henning Wachsmuth
94
1
0
10 Feb 2025
Linguistic Features Extracted by GPT-4 Improve Alzheimer's Disease
  Detection based on Spontaneous Speech
Linguistic Features Extracted by GPT-4 Improve Alzheimer's Disease Detection based on Spontaneous Speech
Jonathan Heitz
Gerold Schneider
Nicolas Langer
LM&MA
76
0
0
20 Dec 2024
LLMs: A Game-Changer for Software Engineers?
LLMs: A Game-Changer for Software Engineers?
Md Asraful Haque
LLMAG
SyDa
21
0
0
01 Nov 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements
Evaluating Gender Bias of LLMs in Making Morality Judgements
Divij Bajaj
Yuanyuan Lei
Jonathan Tong
Ruihong Huang
26
1
0
13 Oct 2024
Estimating Contribution Quality in Online Deliberations Using a Large
  Language Model
Estimating Contribution Quality in Online Deliberations Using a Large Language Model
Lodewijk Gelauff
Mohak Goyal
Bhargav Dindukurthi
Ashish Goel
Alice Siu
27
0
0
21 Aug 2024
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of
  Free-Form Text
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text
Sher Badshah
Hassan Sajjad
ELM
26
8
0
17 Aug 2024
To Believe or Not to Believe Your LLM
To Believe or Not to Believe Your LLM
Yasin Abbasi-Yadkori
Ilja Kuzborskij
András György
Csaba Szepesvári
UQCV
53
14
0
04 Jun 2024
AgentQuest: A Modular Benchmark Framework to Measure Progress and
  Improve LLM Agents
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
Luca Gioacchini
G. Siracusano
D. Sanvito
Kiril Gashteovski
David Friede
Roberto Bifulco
Carolin (Haas) Lawrence
ELM
LLMAG
36
10
0
09 Apr 2024
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
161
388
0
02 May 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
313
8,261
0
28 Jan 2022
1