ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14709
  4. Cited By
Beyond Leaderboards: A survey of methods for revealing weaknesses in
  Natural Language Inference data and models

Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

29 May 2020
Viktor Schlegel
Goran Nenadic
R. Batista-Navarro
    ELM
ArXivPDFHTML

Papers citing "Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models"

8 / 8 papers shown
Title
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Y. Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
20
2
0
23 May 2023
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
  Benchmarks
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
Nikil Selvam
Sunipa Dev
Daniel Khashabi
Tushar Khot
Kai-Wei Chang
ALM
11
25
0
18 Oct 2022
False perfection in machine prediction: Detecting and assessing
  circularity problems in machine learning
False perfection in machine prediction: Detecting and assessing circularity problems in machine learning
Michael Hagmann
Stefan Riezler
11
1
0
23 Jun 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
243
283
0
02 Feb 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
202
806
0
13 Sep 2019
Certified Robustness to Adversarial Word Substitutions
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
175
290
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
294
4,187
0
23 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1