ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.13486
  4. Cited By
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

20 October 2023
Lucas Weber
Elia Bruni
Dieuwke Hupkes
ArXivPDFHTML

Papers citing "Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning"

25 / 25 papers shown
Title
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
Dieuwke Hupkes
Nikolay Bogoychev
46
0
0
14 Apr 2025
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
Jirui Qi
Raquel Fernández
Arianna Bisazza
RALM
56
0
0
01 Apr 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
53
0
0
03 Mar 2025
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Jungsoo Park
Junmo Kang
Gabriel Stanovsky
Alan Ritter
48
0
0
26 Feb 2025
SelfPrompt: Autonomously Evaluating LLM Robustness via
  Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
61
0
0
01 Dec 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain
  Knowledge Graphs
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
24
0
0
16 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
27
4
0
14 Jun 2024
Efficient multi-prompt evaluation of LLMs
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
26
17
0
27 May 2024
Is Temperature the Creativity Parameter of Large Language Models?
Is Temperature the Creativity Parameter of Large Language Models?
Max Peeperkorn
Tom Kouwenhoven
Daniel G. Brown
Anna K. Jordanous
26
6
0
01 May 2024
Examining the robustness of LLM evaluation to the distributional
  assumptions of benchmarks
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
51
13
0
25 Apr 2024
Sample Design Engineering: An Empirical Study of What Makes Good
  Downstream Fine-Tuning Samples for LLMs
Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs
Biyang Guo
He Wang
Wenyilin Xiao
Hong Chen
Zhuxin Lee
Songqiao Han
Hailiang Huang
19
2
0
19 Apr 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
23
6
0
18 Apr 2024
tinyBenchmarks: evaluating LLMs with fewer examples
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
16
72
0
22 Feb 2024
On Sensitivity of Learning with Limited Labelled Data to the Effects of
  Randomness: Impact of Interactions and Systematic Choices
On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices
Branislav Pecher
Ivan Srba
M. Bieliková
42
3
0
20 Feb 2024
When Benchmarks are Targets: Revealing the Sensitivity of Large Language
  Model Leaderboards
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Norah A. Alzahrani
H. A. Alyahya
Sultan Yazeed Alnumay
Muhtasim Tahmid
Shaykhah Alsubaie
...
Saleh Soltan
Nathan Scales
Marie-Anne Lachaux
Samuel R. Bowman
Haidar Khan
ELM
7
69
0
01 Feb 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
15
10
0
09 Jan 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
8
123
0
31 Dec 2023
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large
  Language Models
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models
Youssef Benchekroun
Megi Dervishi
Mark Ibrahim
Jean-Baptiste Gaya
Xavier Martinet
Grégoire Mialon
Thomas Scialom
Emmanuel Dupoux
Dieuwke Hupkes
Pascal Vincent
LRM
17
6
0
27 Nov 2023
Measuring the Robustness of NLP Models to Domain Shifts
Measuring the Robustness of NLP Models to Domain Shifts
Nitay Calderon
Naveh Porat
Eyal Ben-David
Alexander Chapanin
Zorik Gekhman
Nadav Oved
Vitaly Shalumov
Roi Reichart
10
6
0
31 May 2023
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
91
0
06 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
PromptSource: An Integrated Development Environment and Repository for
  Natural Language Prompts
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
212
335
0
02 Feb 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
882
0
18 Apr 2021
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
230
306
0
21 Aug 2019
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
187
574
0
02 May 2018
1