Benchmark Transparency: Measuring the Impact of Data on Evaluation

Benchmark Transparency: Measuring the Impact of Data on Evaluation

31 March 2024

Venelin Kovatchev

Matthew Lease

Papers citing "Benchmark Transparency: Measuring the Impact of Data on Evaluation"

5 / 5 papers shown

Title
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation? Joris Baan Raquel Fernández Barbara Plank Wilker Aziz 38 10 0 25 Feb 2024
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples Venelin Kovatchev Mariona Taulé 25 4 0 06 Oct 2022
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 159 157 0 16 Oct 2021
Hypothesis Only Baselines in Natural Language Inference Adam Poliak Jason Naradowsky Aparajita Haldar Rachel Rudinger Benjamin Van Durme 187 576 0 02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018