ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXivPDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown
Title
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through
  Question Decomposition
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition
Mor Geva
Tomer Wolfson
Jonathan Berant
ReLM
LRM
20
21
0
29 Jul 2021
Stress Test Evaluation of Biomedical Word Embeddings
Stress Test Evaluation of Biomedical Word Embeddings
Vladimir Araujo
Andrés Carvallo
Carlos Aspillaga
C. Thorne
Denis Parra
9
8
0
24 Jul 2021
Tailor: Generating and Perturbing Text with Semantic Controls
Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner
136
77
0
15 Jul 2021
An Investigation of the (In)effectiveness of Counterfactually Augmented
  Data
An Investigation of the (In)effectiveness of Counterfactually Augmented Data
Nitish Joshi
He He
OODD
19
46
0
01 Jul 2021
Combining Feature and Instance Attribution to Detect Artifacts
Combining Feature and Instance Attribution to Detect Artifacts
Pouya Pezeshkpour
Sarthak Jain
Sameer Singh
Byron C. Wallace
TDI
18
43
0
01 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
24
93
0
30 Jun 2021
Probing Pre-Trained Language Models for Disease Knowledge
Probing Pre-Trained Language Models for Disease Knowledge
Israa Alghanmi
Luis Espinosa-Anke
Steven Schockaert
LM&MA
ELM
18
13
0
14 Jun 2021
Evaluating Entity Disambiguation and the Role of Popularity in
  Retrieval-Based NLP
Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP
Anthony Chen
Pallavi Gudipati
Shayne Longpre
Xiao Ling
Sameer Singh
17
38
0
12 Jun 2021
Investigating Transfer Learning in Multilingual Pre-trained Language
  Models through Chinese Natural Language Inference
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
19
11
0
07 Jun 2021
Figurative Language in Recognizing Textual Entailment
Figurative Language in Recognizing Textual Entailment
Tuhin Chakrabarty
Debanjan Ghosh
Adam Poliak
Smaranda Muresan
14
37
0
02 Jun 2021
SyGNS: A Systematic Generalization Testbed Based on Natural Language
  Semantics
SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka
K. Mineshima
Kentaro Inui
NAI
AI4CE
38
11
0
02 Jun 2021
Counterfactual Invariance to Spurious Correlations: Why and How to Pass
  Stress Tests
Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests
Victor Veitch
Alexander DÁmour
Steve Yadlowsky
Jacob Eisenstein
OOD
21
91
0
31 May 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic
  Next-Generation Benchmarking
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
33
56
0
21 May 2021
Are Larger Pretrained Language Models Uniformly Better? Comparing
  Performance at the Instance Level
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
Ruiqi Zhong
Dhruba Ghosh
Dan Klein
Jacob Steinhardt
28
35
0
13 May 2021
Understanding by Understanding Not: Modeling Negation in Language Models
Understanding by Understanding Not: Modeling Negation in Language Models
Arian Hosseini
Siva Reddy
Dzmitry Bahdanau
R. Devon Hjelm
Alessandro Sordoni
Aaron C. Courville
11
87
0
07 May 2021
Flexible Generation of Natural Language Deductions
Flexible Generation of Natural Language Deductions
Kaj Bostrom
Xinyu Zhao
Swarat Chaudhuri
Greg Durrett
ReLM
LRM
265
33
0
18 Apr 2021
Can NLI Models Verify QA Systems' Predictions?
Can NLI Models Verify QA Systems' Predictions?
Jifan Chen
Eunsol Choi
Greg Durrett
23
54
0
18 Apr 2021
Does Putting a Linguist in the Loop Improve NLU Data Collection?
Does Putting a Linguist in the Loop Improve NLU Data Collection?
Alicia Parrish
William Huang
Omar Agha
Soo-hwan Lee
Nikita Nangia
Alex Warstadt
Karmanya Aggarwal
Emily Allaway
Tal Linzen
Samuel R. Bowman
25
40
0
15 Apr 2021
Dynabench: Rethinking Benchmarking in NLP
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
24
387
0
07 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Samuel R. Bowman
George E. Dahl
ELM
ALM
30
156
0
05 Apr 2021
Contrastive Explanations for Model Interpretability
Contrastive Explanations for Model Interpretability
Alon Jacovi
Swabha Swayamdipta
Shauli Ravfogel
Yanai Elazar
Yejin Choi
Yoav Goldberg
44
95
0
02 Mar 2021
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Abhilasha Ravichander
Siddharth Dalmia
Maria Ryskina
Florian Metze
Eduard H. Hovy
A. Black
ELM
23
32
0
16 Feb 2021
Statistically Profiling Biases in Natural Language Reasoning Datasets
  and Models
Statistically Profiling Biases in Natural Language Reasoning Datasets and Models
Shanshan Huang
Kenny Q. Zhu
16
1
0
09 Feb 2021
SICKNL: A Dataset for Dutch Natural Language Inference
SICKNL: A Dataset for Dutch Natural Language Inference
G. Wijnholds
M. Moortgat
6
25
0
14 Jan 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
136
0
13 Jan 2021
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and
  Improving Models
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu
Marco Tulio Ribeiro
Jeffrey Heer
Daniel S. Weld
41
240
0
01 Jan 2021
Using Natural Language Relations between Answer Choices for Machine
  Comprehension
Using Natural Language Relations between Answer Choices for Machine Comprehension
Rajkumar Pujari
Dan Goldwasser
11
5
0
31 Dec 2020
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
31
259
0
31 Dec 2020
DynaSent: A Dynamic Benchmark for Sentiment Analysis
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine
  Learning
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
53
669
0
06 Nov 2020
ANLIzing the Adversarial Natural Language Inference Dataset
ANLIzing the Adversarial Natural Language Inference Dataset
Adina Williams
Tristan Thrush
Douwe Kiela
AAML
174
46
0
24 Oct 2020
Improving Robustness by Augmenting Training Sentences with
  Predicate-Argument Structures
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures
N. Moosavi
M. Boer
Prasetya Ajie Utama
Iryna Gurevych
19
13
0
23 Oct 2020
ConjNLI: Natural Language Inference Over Conjunctive Sentences
ConjNLI: Natural Language Inference Over Conjunctive Sentences
Swarnadeep Saha
Yixin Nie
Joey Tianyi Zhou
4
35
0
20 Oct 2020
The Extraordinary Failure of Complement Coercion Crowdsourcing
The Extraordinary Failure of Complement Coercion Crowdsourcing
Yanai Elazar
Victoria Basmov
Shauli Ravfogel
Yoav Goldberg
Reut Tsarfaty
14
6
0
12 Oct 2020
OCNLI: Original Chinese Natural Language Inference
OCNLI: Original Chinese Natural Language Inference
Hai Hu
Kyle Richardson
Liang Xu
Lu Li
Sandra Kübler
L. Moss
31
118
0
12 Oct 2020
On the Importance of Adaptive Data Collection for Extremely Imbalanced
  Pairwise Tasks
On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks
Stephen Mussmann
Robin Jia
Percy Liang
8
15
0
10 Oct 2020
Counterfactually-Augmented SNLI Training Data Does Not Yield Better
  Generalization Than Unaugmented Data
Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data
William Huang
Haokun Liu
Samuel R. Bowman
16
37
0
09 Oct 2020
An Empirical Study on Model-agnostic Debiasing Strategies for Robust
  Natural Language Inference
An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference
Tianyu Liu
Xin Zheng
Xiaoan Ding
Baobao Chang
Zhifang Sui
29
23
0
08 Oct 2020
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
  Text Generation
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Tianlu Wang
Xuezhi Wang
Yao Qin
Ben Packer
Kang Li
Jilin Chen
Alex Beutel
Ed H. Chi
SILM
32
82
0
05 Oct 2020
TaxiNLI: Taking a Ride up the NLU Hill
TaxiNLI: Taking a Ride up the NLU Hill
Pratik M. Joshi
Somak Aditya
Aalok Sathe
Monojit Choudhury
20
36
0
30 Sep 2020
Towards Debiasing NLU Models from Unknown Biases
Towards Debiasing NLU Models from Unknown Biases
Prasetya Ajie Utama
N. Moosavi
Iryna Gurevych
19
154
0
25 Sep 2020
Towards Improving Selective Prediction Ability of NLP Systems
Towards Improving Selective Prediction Ability of NLP Systems
Neeraj Varshney
Swaroop Mishra
Chitta Baral
8
23
0
21 Aug 2020
Selective Question Answering under Domain Shift
Selective Question Answering under Domain Shift
Amita Kamath
Robin Jia
Percy Liang
OOD
13
206
0
16 Jun 2020
Beyond Leaderboards: A survey of methods for revealing weaknesses in
  Natural Language Inference data and models
Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models
Viktor Schlegel
Goran Nenadic
R. Batista-Navarro
ELM
33
18
0
29 May 2020
(Re)construing Meaning in NLP
(Re)construing Meaning in NLP
Sean Trott
Tiago Timponi Torrent
Nancy Chang
Nathan Schneider
AI4CE
13
30
0
18 May 2020
Logical Inferences with Comparatives and Generalized Quantifiers
Logical Inferences with Comparatives and Generalized Quantifiers
Izumi Haruta
K. Mineshima
D. Bekki
ELM
19
11
0
16 May 2020
INFOTABS: Inference on Tables as Semi-structured Data
INFOTABS: Inference on Tables as Semi-structured Data
Vivek Gupta
Maitrey Mehta
Pegah Nokhiz
Vivek Srikumar
LMTD
6
100
0
13 May 2020
Towards Robustifying NLI Models Against Lexical Dataset Biases
Towards Robustifying NLI Models Against Lexical Dataset Biases
Xiang Zhou
Joey Tianyi Zhou
23
57
0
10 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
8
1,080
0
08 May 2020
DQI: Measuring Data Quality in NLP
DQI: Measuring Data Quality in NLP
Swaroop Mishra
Anjana Arunkumar
Bhavdeep Singh Sachdeva
Chris Bryan
Chitta Baral
36
30
0
02 May 2020
Previous
12345
Next