ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.13606
  4. Cited By
The Curse of Performance Instability in Analysis Datasets: Consequences,
  Source, and Suggestions

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

28 April 2020
Xiang Zhou
Yixin Nie
Hao Tan
Joey Tianyi Zhou
ArXivPDFHTML

Papers citing "The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions"

15 / 15 papers shown
Title
Distributional Scaling Laws for Emergent Capabilities
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
39
0
0
24 Feb 2025
Punctuation Restoration Improves Structure Understanding Without Supervision
Punctuation Restoration Improves Structure Understanding Without Supervision
Junghyun Min
Minho Lee
Woochul Lee
Yeonsoo Lee
59
1
0
13 Feb 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
25
125
0
02 Aug 2023
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
242
45
0
24 May 2022
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models
Mithun Das
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
33
15
0
30 Apr 2022
Data-driven Model Generalizability in Crosslinguistic Low-resource
  Morphological Segmentation
Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation
Zoey Liu
Emily Tucker Prudhommeaux
43
4
0
05 Jan 2022
The MultiBERTs: BERT Reproductions for Robustness Analysis
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
24
93
0
30 Jun 2021
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
31
259
0
31 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine
  Learning
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
77
670
0
06 Nov 2020
ANLIzing the Adversarial Natural Language Inference Dataset
ANLIzing the Adversarial Natural Language Inference Dataset
Adina Williams
Tristan Thrush
Douwe Kiela
AAML
174
46
0
24 Oct 2020
Improving Robustness by Augmenting Training Sentences with
  Predicate-Argument Structures
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures
N. Moosavi
M. Boer
Prasetya Ajie Utama
Iryna Gurevych
19
13
0
23 Oct 2020
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
  Text Generation
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Tianlu Wang
Xuezhi Wang
Yao Qin
Ben Packer
Kang Li
Jilin Chen
Alex Beutel
Ed H. Chi
SILM
32
82
0
05 Oct 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
The Fine Line between Linguistic Generalization and Failure in
  Seq2Seq-Attention Models
The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models
Noah Weber
L. Shekhar
Niranjan Balasubramanian
98
30
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1