ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.00748
  4. Cited By
Benchmark Transparency: Measuring the Impact of Data on Evaluation

Benchmark Transparency: Measuring the Impact of Data on Evaluation

31 March 2024
Venelin Kovatchev
Matthew Lease
ArXivPDFHTML

Papers citing "Benchmark Transparency: Measuring the Impact of Data on Evaluation"

5 / 5 papers shown
Title
Interpreting Predictive Probabilities: Model Confidence or Human Label
  Variation?
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
Joris Baan
Raquel Fernández
Barbara Plank
Wilker Aziz
38
10
0
25 Feb 2024
InferES : A Natural Language Inference Corpus for Spanish Featuring
  Negation-Based Contrastive and Adversarial Examples
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
25
4
0
06 Oct 2022
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Understanding Dataset Difficulty with V\mathcal{V}V-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
159
157
0
16 Oct 2021
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
187
576
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1