How not to Lie with a Benchmark: Rearranging NLP Leaderboards

How not to Lie with a Benchmark: Rearranging NLP Leaderboards

2 December 2021

Tatiana Shavrina

Valentin Malykh

ArXiv (abs)PDF HTML

Papers citing "How not to Lie with a Benchmark: Rearranging NLP Leaderboards"

7 / 7 papers shown

Title
Beyond statistical significance: Quantifying uncertainty and statistical variability in multilingual and multitask NLP evaluation Jonne Sälevä Duygu Ataman Constantine Lignos 116 0 0 26 Sep 2025
LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal DomainConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Joel Niklaus Veton Matoshi Pooja Rani Andrea Galassi Matthias Sturmer Ilias Chalkidis ELM AILaw 319 72 0 30 Jan 2023
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer Dimitris Mamakas Petros Tsotsi Ion Androutsopoulos Ilias Chalkidis VLM AILaw 209 33 0 02 Nov 2022
Voteñ'Rank: Revision of Benchmarking with Social Choice TheoryConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 Mark Rofin Vladislav Mikhailov Mikhail Florinskiy A. Kravchenko E. Tutubalina Tatiana Shavrina Daniel Karabekyan Ekaterina Artemova 278 15 0 11 Oct 2022
Automatic Rule Induction for Interpretable Semi-Supervised LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Reid Pryzant Ziyi Yang Yichong Xu Chenguang Zhu Michael Zeng 243 10 0 18 May 2022
Slovene SuperGLUE Benchmark: Translation and EvaluationInternational Conference on Language Resources and Evaluation (LREC), 2022 Aleš Žagar Marko Robnik-Šikonja 147 12 0 10 Feb 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English Ilias Chalkidis Abhik Jana D. Hartung M. Bommarito Ion Androutsopoulos Daniel Martin Katz Nikolaos Aletras AILaw ELM 450 354 0 03 Oct 2021