Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.06052
Cited By
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Neural Information Processing Systems (NeurIPS), 2021
21 May 2021
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking"
41 / 41 papers shown
Title
AcademicEval: Live Long-Context LLM Benchmark
Haozhen Zhang
Tao Feng
Pengrui Han
Jiaxuan You
100
2
0
20 Oct 2025
From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models
Jessica Quaye
Charvi Rastogi
Alicia Parrish
Oana Inel
Minsuk Kahng
Lora Aroyo
Vijay Janapa Reddi
135
0
0
23 Jul 2025
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFin
LRM
170
1
0
18 Jun 2025
Never Skip a Batch: Continuous Training of Temporal GNNs via Adaptive Pseudo-Supervision
Alexander Panyshev
Dmitry Vinichenko
Oleg Travkin
Roman Alferov
Alexey Zaytsev
195
0
0
18 May 2025
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
International Conference on Learning Representations (ICLR), 2025
Shengzhuang Chen
Yikai Liao
Xiaoxiao Sun
Kede Ma
Ying Wei
328
1
0
06 Mar 2025
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Chirag Shah
439
3
0
06 Mar 2025
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra
Gabriel Poesia
Belinda Mo
308
3
0
01 Jul 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
204
23
0
25 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
442
11
0
20 Jun 2024
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan
Dehan Kong
Sida Zhou
Cheng Cui
Yifei Leng
...
Hangyu Liu
Yanyi Shang
Shuyan Zhou
Tongshuang Wu
Zhengyang Wu
343
70
0
18 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
239
86
0
06 Jun 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Lingyao Li
Yongfeng Zhang
Qinlin Zhao
Ruochen Xu
Xing Xie
309
54
0
21 Feb 2024
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
203
1
0
20 Jan 2024
Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision
AAAI Conference on Artificial Intelligence (AAAI), 2023
Nicholas Lui
Bryan Chia
William Berrios
Candace Ross
Douwe Kiela
222
2
0
25 Nov 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
312
160
0
08 Nov 2023
Meta Semantic Template for Evaluation of Large Language Models
Yachuan Liu
Liang Chen
Yongfeng Zhang
Qiaozhu Mei
Xing Xie
196
1
0
01 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
International Conference on Learning Representations (ICLR), 2023
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
359
78
0
29 Sep 2023
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Hao Peng
Qingqing Cao
Jesse Dodge
Matthew E. Peters
Jared Fernandez
...
Darrell Plessas
Iz Beltagy
Evan Pete Walsh
Noah A. Smith
Hannaneh Hajishirzi
146
8
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
676
2,636
0
06 Jul 2023
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Momchil Hardalov
Pepa Atanasova
Todor Mihaylov
G. Angelova
K. Simov
P. Osenova
Ves Stoyanov
Ivan Koychev
Preslav Nakov
Dragomir R. Radev
ELM
FedML
218
8
0
04 Jun 2023
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Saku Sugawara
S. Tsugita
ELM
275
2
0
24 May 2023
Cross-functional Analysis of Generalisation in Behavioural Learning
Transactions of the Association for Computational Linguistics (TACL), 2023
Pedro Henrique Luz de Araujo
Benjamin Roth
126
4
0
22 May 2023
Accounting for multiplicity in machine learning benchmark performance
Kajsa Møllersen
Einar J. Holsbø
142
3
0
10 Mar 2023
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Keshav Santhanam
Jon Saad-Falcon
M. Franz
Omar Khattab
Avirup Sil
Radu Florian
Md Arafat Sultan
Salim Roukos
Matei A. Zaharia
Christopher Potts
OffRL
197
11
0
02 Dec 2022
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Peng Guo
Jesse Thomason
Robin Jia
VLM
OOD
NAI
LRM
113
8
0
26 Oct 2022
Predicting Fine-Tuning Performance with Probing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zining Zhu
Soroosh Shahtalebi
Frank Rudzicz
147
12
0
13 Oct 2022
Voteñ'Rank: Revision of Benchmarking with Social Choice Theory
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mark Rofin
Vladislav Mikhailov
Mikhail Florinskiy
A. Kravchenko
E. Tutubalina
Tatiana Shavrina
Daniel Karabekyan
Ekaterina Artemova
278
15
0
11 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
191
32
0
06 Oct 2022
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Leandro von Werra
Lewis Tunstall
A. Thakur
A. Luccioni
Tristan Thrush
...
Julien Chaumond
Margaret Mitchell
Alexander M. Rush
Thomas Wolf
Douwe Kiela
ELM
225
34
0
30 Sep 2022
RealTime QA: What's the Answer Right Now?
Neural Information Processing Systems (NeurIPS), 2022
Jungo Kasai
Keisuke Sakaguchi
Yoichi Takahashi
Ronan Le Bras
Akari Asai
Xinyan Velocity Yu
Dragomir R. Radev
Noah A. Smith
Yejin Choi
Kentaro Inui
KELM
378
251
0
27 Jul 2022
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold
Findings (Findings), 2022
Sebastian Ruder
Ivan Vulić
Anders Søgaard
165
36
0
20 Jun 2022
Adversarial Text Normalization
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
174
2
0
08 Jun 2022
Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages
Findings (Findings), 2022
Simran Khanuja
Sebastian Ruder
Partha P. Talukdar
437
24
0
25 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
186
4
0
13 May 2022
Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Kaitlyn Zhou
Kawin Ethayarajh
Dallas Card
Dan Jurafsky
180
81
0
10 May 2022
Towards Climate Awareness in NLP Research
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Hershcovich
Nicolas Webersinke
Mathias Kraus
J. Bingler
Markus Leippold
335
43
0
10 May 2022
A global analysis of metrics used for measuring performance in natural language processing
Kathrin Blagec
Georg Dorffner
M. Moradi
Simon Ott
Matthias Samwald
144
38
0
25 Apr 2022
Fantastic Data and How to Query Them
T. Tran
Le-Tuan Anh
M. Duc
Jicheng Yuan
Danh Le-Phuoc
100
4
0
13 Jan 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
476
31
0
16 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
294
47
0
15 Oct 2021
ANLIzing the Adversarial Natural Language Inference Dataset
Adina Williams
Tristan Thrush
Douwe Kiela
AAML
367
48
0
24 Oct 2020
1