ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06052
  4. Cited By
Dynaboard: An Evaluation-As-A-Service Platform for Holistic
  Next-Generation Benchmarking

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

Neural Information Processing Systems (NeurIPS), 2021
21 May 2021
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
    ELM
ArXiv (abs)PDFHTML

Papers citing "Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking"

41 / 41 papers shown
Title
AcademicEval: Live Long-Context LLM Benchmark
AcademicEval: Live Long-Context LLM Benchmark
Haozhen Zhang
Tao Feng
Pengrui Han
Jiaxuan You
100
2
0
20 Oct 2025
From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models
Jessica Quaye
Charvi Rastogi
Alicia Parrish
Oana Inel
Minsuk Kahng
Lora Aroyo
Vijay Janapa Reddi
135
0
0
23 Jul 2025
Finance Language Model Evaluation (FLaME)
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFinLRM
170
1
0
18 Jun 2025
Never Skip a Batch: Continuous Training of Temporal GNNs via Adaptive Pseudo-Supervision
Never Skip a Batch: Continuous Training of Temporal GNNs via Adaptive Pseudo-Supervision
Alexander Panyshev
Dmitry Vinichenko
Oleg Travkin
Roman Alferov
Alexey Zaytsev
195
0
0
18 May 2025
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained ModelsInternational Conference on Learning Representations (ICLR), 2025
Shengzhuang Chen
Yikai Liao
Xiaoxiao Sun
Kede Ma
Ying Wei
328
1
0
06 Mar 2025
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering DatasetsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Chirag Shah
439
3
0
06 Mar 2025
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra
Gabriel Poesia
Belinda Mo
308
3
0
01 Jul 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
204
23
0
25 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELMALM
442
11
0
20 Jun 2024
WebCanvas: Benchmarking Web Agents in Online Environments
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan
Dehan Kong
Sida Zhou
Cheng Cui
Yifei Leng
...
Hangyu Liu
Yanyi Shang
Shuyan Zhou
Tongshuang Wu
Zhengyang Wu
343
70
0
18 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELMALM
239
86
0
06 Jun 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Lingyao Li
Yongfeng Zhang
Qinlin Zhao
Ruochen Xu
Xing Xie
309
54
0
21 Feb 2024
How the Advent of Ubiquitous Large Language Models both Stymie and
  Turbocharge Dynamic Adversarial Question Generation
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
203
1
0
20 Jan 2024
Leveraging Diffusion Perturbations for Measuring Fairness in Computer
  Vision
Leveraging Diffusion Perturbations for Measuring Fairness in Computer VisionAAAI Conference on Artificial Intelligence (AAAI), 2023
Nicholas Lui
Bryan Chia
William Berrios
Candace Ross
Douwe Kiela
222
2
0
25 Nov 2023
Rethinking Benchmark and Contamination for Language Models with
  Rephrased Samples
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
312
160
0
08 Nov 2023
Meta Semantic Template for Evaluation of Large Language Models
Meta Semantic Template for Evaluation of Large Language Models
Yachuan Liu
Liang Chen
Yongfeng Zhang
Qiaozhu Mei
Xing Xie
196
1
0
01 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
DyVal: Dynamic Evaluation of Large Language Models for Reasoning TasksInternational Conference on Learning Representations (ICLR), 2023
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELMLRM
359
78
0
29 Sep 2023
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Hao Peng
Qingqing Cao
Jesse Dodge
Matthew E. Peters
Jared Fernandez
...
Darrell Plessas
Iz Beltagy
Evan Pete Walsh
Noah A. Smith
Hannaneh Hajishirzi
146
8
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
676
2,636
0
06 Jul 2023
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
bgGLUE: A Bulgarian General Language Understanding Evaluation BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Momchil Hardalov
Pepa Atanasova
Todor Mihaylov
G. Angelova
K. Simov
P. Osenova
Ves Stoyanov
Ivan Koychev
Preslav Nakov
Dragomir R. Radev
ELMFedML
218
8
0
04 Jun 2023
On Degrees of Freedom in Defining and Testing Natural Language
  Understanding
On Degrees of Freedom in Defining and Testing Natural Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Saku Sugawara
S. Tsugita
ELM
275
2
0
24 May 2023
Cross-functional Analysis of Generalisation in Behavioural Learning
Cross-functional Analysis of Generalisation in Behavioural LearningTransactions of the Association for Computational Linguistics (TACL), 2023
Pedro Henrique Luz de Araujo
Benjamin Roth
126
4
0
22 May 2023
Accounting for multiplicity in machine learning benchmark performance
Accounting for multiplicity in machine learning benchmark performance
Kajsa Møllersen
Einar J. Holsbø
142
3
0
10 Mar 2023
Moving Beyond Downstream Task Accuracy for Information Retrieval
  Benchmarking
Moving Beyond Downstream Task Accuracy for Information Retrieval BenchmarkingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Keshav Santhanam
Jon Saad-Falcon
M. Franz
Omar Khattab
Avirup Sil
Radu Florian
Md Arafat Sultan
Salim Roukos
Matei A. Zaharia
Christopher Potts
OffRL
197
11
0
02 Dec 2022
Generalization Differences between End-to-End and Neuro-Symbolic
  Vision-Language Reasoning Systems
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Peng Guo
Jesse Thomason
Robin Jia
VLMOODNAILRM
113
8
0
26 Oct 2022
Predicting Fine-Tuning Performance with Probing
Predicting Fine-Tuning Performance with ProbingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zining Zhu
Soroosh Shahtalebi
Frank Rudzicz
147
12
0
13 Oct 2022
Voteñ'Rank: Revision of Benchmarking with Social Choice Theory
Voteñ'Rank: Revision of Benchmarking with Social Choice TheoryConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mark Rofin
Vladislav Mikhailov
Mikhail Florinskiy
A. Kravchenko
E. Tutubalina
Tatiana Shavrina
Daniel Karabekyan
Ekaterina Artemova
278
15
0
11 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Toxicity in Multilingual Machine Translation at ScaleConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
191
32
0
06 Oct 2022
Evaluate & Evaluation on the Hub: Better Best Practices for Data and
  Model Measurements
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model MeasurementsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Leandro von Werra
Lewis Tunstall
A. Thakur
A. Luccioni
Tristan Thrush
...
Julien Chaumond
Margaret Mitchell
Alexander M. Rush
Thomas Wolf
Douwe Kiela
ELM
225
34
0
30 Sep 2022
RealTime QA: What's the Answer Right Now?
RealTime QA: What's the Answer Right Now?Neural Information Processing Systems (NeurIPS), 2022
Jungo Kasai
Keisuke Sakaguchi
Yoichi Takahashi
Ronan Le Bras
Akari Asai
Xinyan Velocity Yu
Dragomir R. Radev
Noah A. Smith
Yejin Choi
Kentaro Inui
KELM
378
251
0
27 Jul 2022
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the
  Research Manifold
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research ManifoldFindings (Findings), 2022
Sebastian Ruder
Ivan Vulić
Anders Søgaard
165
36
0
20 Jun 2022
Adversarial Text Normalization
Adversarial Text NormalizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Joanna Bitton
Maya Pavlova
Ivan Evtimov
AAML
174
2
0
08 Jun 2022
Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case
  Study for Indian Languages
Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian LanguagesFindings (Findings), 2022
Simran Khanuja
Sebastian Ruder
Partha P. Talukdar
437
24
0
25 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation DatasetsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
186
4
0
13 May 2022
Problems with Cosine as a Measure of Embedding Similarity for High
  Frequency Words
Problems with Cosine as a Measure of Embedding Similarity for High Frequency WordsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Kaitlyn Zhou
Kawin Ethayarajh
Dallas Card
Dan Jurafsky
180
81
0
10 May 2022
Towards Climate Awareness in NLP Research
Towards Climate Awareness in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Hershcovich
Nicolas Webersinke
Mathias Kraus
J. Bingler
Markus Leippold
335
43
0
10 May 2022
A global analysis of metrics used for measuring performance in natural
  language processing
A global analysis of metrics used for measuring performance in natural language processing
Kathrin Blagec
Georg Dorffner
M. Moradi
Simon Ott
Matthias Samwald
144
38
0
25 Apr 2022
Fantastic Data and How to Query Them
Fantastic Data and How to Query Them
T. Tran
Le-Tuan Anh
M. Duc
Jicheng Yuan
Danh Le-Phuoc
100
4
0
13 Jan 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
476
31
0
16 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP
  Systems Fail
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
294
47
0
15 Oct 2021
ANLIzing the Adversarial Natural Language Inference Dataset
ANLIzing the Adversarial Natural Language Inference Dataset
Adina Williams
Tristan Thrush
Douwe Kiela
AAML
367
48
0
24 Oct 2020
1