Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.08638
Cited By
Anchor Points: Benchmarking Models with Much Fewer Examples
14 September 2023
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Anchor Points: Benchmarking Models with Much Fewer Examples"
25 / 25 papers shown
Title
Efficient Evaluation of Large Language Models via Collaborative Filtering
Xu-Xiang Zhong
Chao Yi
Han-Jia Ye
24
0
0
05 Apr 2025
Reliable and Efficient Amortized Model-based Evaluation
Sang T. Truong
Yuheng Tu
Percy Liang
Bo-wen Li
Sanmi Koyejo
ELM
59
1
0
17 Mar 2025
BenTo: Benchmark Task Reduction with In-Context Transferability
Hongyu Zhao
Ming Li
Lichao Sun
Tianyi Zhou
28
0
0
17 Oct 2024
Active Evaluation Acquisition for Efficient LLM Benchmarking
Yang Li
Jie Ma
Miguel Ballesteros
Yassine Benajiba
Graham Horwood
ELM
14
1
0
08 Oct 2024
Instruction Embedding: Latent Representations of Instructions Towards Task Identification
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
Xinglin Wang
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
23
2
0
29 Sep 2024
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
Lorenzo Pacchiardi
Lucy G. Cheke
José Hernández Orallo
ALM
LRM
ELM
36
3
0
05 Sep 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
85
74
0
17 Jul 2024
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling
Cong Xu
Gayathri Saranathan
Mahammad Parwez Alam
Arpit Shah
James Lim
Soon Yee Wong
Foltin Martin
Suparna Bhattacharya
VLM
35
3
0
21 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
33
9
0
14 Jun 2024
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
37
17
0
27 May 2024
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
51
13
0
25 Apr 2024
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
Lin Zhao
Tianchen Zhao
Zinan Lin
Xuefei Ning
Guohao Dai
Huazhong Yang
Yu Wang
EGVM
42
7
0
25 Mar 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Ameya Prabhu
Vishaal Udandarao
Philip H. S. Torr
Matthias Bethge
Adel Bibi
Samuel Albanie
23
5
0
29 Feb 2024
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
24
72
0
22 Feb 2024
Label-Efficient Model Selection for Text Generation
Shir Ashury-Tahan
Ariel Gera
Benjamin Sznajder
Leshem Choshen
L. Ein-Dor
Eyal Shnarch
31
4
0
12 Feb 2024
FinanceBench: A New Benchmark for Financial Question Answering
Pranab Islam
Anand Kannappan
Douwe Kiela
Rebecca Qian
Nino Scherrer
Bertie Vidgen
RALM
19
71
0
20 Nov 2023
Post Turing: Mapping the landscape of LLM Evaluation
Alexey Tikhonov
Ivan P. Yamshchikov
ELM
33
4
0
03 Nov 2023
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Qinyuan Ye
Harvey Yiyun Fu
Xiang Ren
Robin Jia
ELM
19
21
0
24 May 2023
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs
Kelvin Guu
Albert Webson
Ellie Pavlick
Lucas Dixon
Ian Tenney
Tolga Bolukbasi
TDI
66
33
0
14 Mar 2023
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
225
338
0
02 Feb 2022
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
157
157
0
16 Oct 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
83
188
0
27 Feb 2021
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
225
115
0
13 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
268
5,652
0
05 Dec 2016
1