Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.17202
Cited By
Efficient multi-prompt evaluation of LLMs
27 May 2024
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient multi-prompt evaluation of LLMs"
16 / 16 papers shown
Title
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
26
0
0
29 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Y. Liu
Qi Wang
Fuzheng Zhang
VLM
53
1
0
10 Apr 2025
Towards LLMs Robustness to Changes in Prompt Format Styles
Lilian Ngweta
Kiran Kate
Jason Tsay
Yara Rizk
AAML
VLM
27
0
0
09 Apr 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
Eliya Habba
Ofir Arviv
Itay Itzhak
Yotam Perlitz
Elron Bandel
Leshem Choshen
Michal Shmueli-Scheuer
Gabriel Stanovsky
64
1
0
03 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
58
0
0
03 Mar 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan
Rima Shahbazyan
Evelina Bakhturina
ELM
33
0
0
28 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
63
0
0
14 Feb 2025
Recommendations Beyond Catalogs: Diffusion Models for Personalized Generation
Gabriel Patron
Zhiwei Xu
Ishan Kapnadak
Felipe Maia Polo
DiffM
38
0
0
05 Feb 2025
Evalita-LLM: Benchmarking Large Language Models on Italian
Bernardo Magnini
Roberto Zanoli
Michele Resta
Martin Cimmino
Paolo Albano
Marco Madeddu
V. Patti
53
1
0
04 Feb 2025
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo
S. Kamath S
Leshem Choshen
Yuekai Sun
Mikhail Yurochkin
82
5
0
09 Dec 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
42
4
0
14 Oct 2024
POSIX: A Prompt Sensitivity Index For Large Language Models
Anwoy Chatterjee
H. S. V. N. S. K. Renduchintala
S. Bhatia
Tanmoy Chakraborty
AAML
13
6
0
03 Oct 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
39
0
0
19 Sep 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
123
415
0
13 Mar 2024
Label-Efficient Model Selection for Text Generation
Shir Ashury-Tahan
Ariel Gera
Benjamin Sznajder
Leshem Choshen
L. Ein-Dor
Eyal Shnarch
23
4
0
12 Feb 2024
SPELL: Semantic Prompt Evolution based on a LLM
Yujian Betterest Li
Kai Wu
42
10
0
02 Oct 2023
1