ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXivPDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown
Title
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
143
23
0
10 Oct 2022
InferES : A Natural Language Inference Corpus for Spanish Featuring
  Negation-Based Contrastive and Adversarial Examples
InferES : A Natural Language Inference Corpus for Spanish Featuring Negation-Based Contrastive and Adversarial Examples
Venelin Kovatchev
Mariona Taulé
30
4
0
06 Oct 2022
Compositional Evaluation on Japanese Textual Entailment and Similarity
Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka
K. Mineshima
14
24
0
09 Aug 2022
Measuring Causal Effects of Data Statistics on Language Model's
  `Factual' Predictions
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Amir Feder
Abhilasha Ravichander
Marius Mosbach
Yonatan Belinkov
Hinrich Schütze
Yoav Goldberg
CML
SyDa
MILM
28
52
0
28 Jul 2022
Probing via Prompting
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
34
13
0
04 Jul 2022
longhorns at DADC 2022: How many linguists does it take to fool a
  Question Answering model? A systematic approach to adversarial attacks
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks
Venelin Kovatchev
Trina Chatterjee
Venkata S Govindarajan
Jifan Chen
Eunsol Choi
...
K. Erk
Matthew Lease
Junyi Jessy Li
Yating Wu
Kyle Mahowald
AAML
ELM
11
10
0
29 Jun 2022
Template-based Approach to Zero-shot Intent Recognition
Template-based Approach to Zero-shot Intent Recognition
Dmitry Lamanov
Pavel Burnyshev
Ekaterina Artemova
Valentin Malykh
A. Bout
Irina Piontkovskaya
4
9
0
22 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
26
14
0
07 Jun 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
242
45
0
24 May 2022
Logical Reasoning with Span-Level Predictions for Interpretable and
  Robust NLI Models
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models
Joe Stacey
Pasquale Minervini
Haim Dubossarsky
Marek Rei
ReLM
LRM
19
14
0
23 May 2022
Let the Model Decide its Curriculum for Multitask Learning
Let the Model Decide its Curriculum for Multitask Learning
Neeraj Varshney
Swaroop Mishra
Chitta Baral
17
8
0
19 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
38
5
0
13 May 2022
White-box Testing of NLP models with Mask Neuron Coverage
White-box Testing of NLP models with Mask Neuron Coverage
Arshdeep Sekhon
Yangfeng Ji
Matthew B. Dwyer
Yanjun Qi
AAML
17
3
0
10 May 2022
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
  Correspondence
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence
Myeongjun Jang
Frank Mtumbuka
Thomas Lukasiewicz
28
9
0
08 May 2022
Generalized Quantifiers as a Source of Error in Multilingual NLU
  Benchmarks
Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks
Ruixiang Cui
Daniel Hershcovich
Anders Søgaard
25
13
0
22 Apr 2022
When Does Syntax Mediate Neural Language Model Performance? Evidence
  from Dropout Probes
When Does Syntax Mediate Neural Language Model Performance? Evidence from Dropout Probes
Mycal Tucker
Tiwalayo Eisape
Peng Qian
R. Levy
J. Shah
MILM
12
12
0
20 Apr 2022
Logical Inference for Counting on Semi-structured Tables
Logical Inference for Counting on Semi-structured Tables
Tomoya Kurosawa
Hitomi Yanaka
LMTD
24
2
0
16 Apr 2022
Evaluation Benchmarks for Spanish Sentence Representations
Evaluation Benchmarks for Spanish Sentence Representations
Vladimir Araujo
Andrés Carvallo
Souvik Kundu
J. Canete
Marcelo Mendoza
Robert E. Mercer
Felipe Bravo-Marquez
Marie-Francine Moens
Alvaro Soto
ELM
27
9
0
15 Apr 2022
mGPT: Few-Shot Learners Go Multilingual
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
43
149
0
15 Apr 2022
Generating Data to Mitigate Spurious Correlations in Natural Language
  Inference Datasets
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
28
67
0
24 Mar 2022
An Analysis of Negation in Natural Language Understanding Corpora
An Analysis of Negation in Natural Language Understanding Corpora
Md Mosharaf Hossain
Dhivya Chinnappa
Eduardo Blanco
10
42
0
16 Mar 2022
Generalized but not Robust? Comparing the Effects of Data Modification
  Methods on Out-of-Domain Generalization and Adversarial Robustness
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
Tejas Gokhale
Swaroop Mishra
Man Luo
Bhavdeep Singh Sachdeva
Chitta Baral
49
29
0
15 Mar 2022
A Proposal to Study "Is High Quality Data All We Need?"
A Proposal to Study "Is High Quality Data All We Need?"
Swaroop Mishra
Anjana Arunkumar
20
2
0
12 Mar 2022
Investigating Selective Prediction Approaches Across Several Tasks in
  IID, OOD, and Adversarial Settings
Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings
Neeraj Varshney
Swaroop Mishra
Chitta Baral
8
55
0
01 Mar 2022
Predicting Out-of-Distribution Error with the Projection Norm
Predicting Out-of-Distribution Error with the Projection Norm
Yaodong Yu
Zitong Yang
Alexander Wei
Yi Ma
Jacob Steinhardt
OODD
12
43
0
11 Feb 2022
Describing Differences between Text Distributions with Natural Language
Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jacob Steinhardt
VLM
132
42
0
28 Jan 2022
Robust Natural Language Processing: Recent Advances, Challenges, and
  Future Directions
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions
Marwan Omar
Soohyeon Choi
Daehun Nyang
David A. Mohaisen
26
57
0
03 Jan 2022
Measure and Improve Robustness in NLP Models: A Survey
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
Belinda Z. Li
Jane A. Yu
Madian Khabsa
Luke Zettlemoyer
A. Halevy
Jacob Andreas
ELM
22
16
0
06 Dec 2021
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning
  Capabilities for NLI
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI
Ishan Tarunesh
Somak Aditya
Monojit Choudhury
ELM
LRM
31
4
0
04 Dec 2021
Understanding Out-of-distribution: A Perspective of Data Dynamics
Understanding Out-of-distribution: A Perspective of Data Dynamics
Dyah Adila
Dongyeop Kang
38
12
0
29 Nov 2021
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language
  Evaluation
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation
David Alfonso-Hermelo
Ahmad Rashid
Abbas Ghaddar
Huawei Noah’s
Mehdi Rezagholizadeh
29
2
0
09 Nov 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
  Language Models
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLM
ELM
AAML
22
214
0
04 Nov 2021
IndoNLI: A Natural Language Inference Dataset for Indonesian
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
26
29
0
27 Oct 2021
Identifying and Benchmarking Natural Out-of-Context Prediction Problems
Identifying and Benchmarking Natural Out-of-Context Prediction Problems
David Madras
D. Psaltis
CML
OOD
24
4
0
25 Oct 2021
Behavioral Experiments for Understanding Catastrophic Forgetting
Behavioral Experiments for Understanding Catastrophic Forgetting
Samuel J. Bell
Neil D. Lawrence
27
4
0
20 Oct 2021
Analyzing Dynamic Adversarial Training Data in the Limit
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
198
30
0
16 Oct 2021
Retrieval-guided Counterfactual Generation for QA
Retrieval-guided Counterfactual Generation for QA
Bhargavi Paranjape
Matthew Lamm
Ian Tenney
25
31
0
14 Oct 2021
Semantically Distributed Robust Optimization for Vision-and-Language
  Inference
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
46
17
0
14 Oct 2021
ReaSCAN: Compositional Reasoning in Language Grounding
ReaSCAN: Compositional Reasoning in Language Grounding
Zhengxuan Wu
Elisa Kreiss
Desmond C. Ong
Christopher Potts
CoGe
LRM
29
22
0
18 Sep 2021
Does External Knowledge Help Explainable Natural Language Inference?
  Automatic Evaluation vs. Human Ratings
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff
Hsiu-yu Yang
Heike Adel
Ngoc Thang Vu
ELM
ReLM
LRM
24
13
0
16 Sep 2021
Types of Out-of-Distribution Texts and How to Detect Them
Types of Out-of-Distribution Texts and How to Detect Them
Udit Arora
William Huang
He He
OODD
225
97
0
14 Sep 2021
An Evaluation Dataset and Strategy for Building Robust Multi-turn
  Response Selection Model
An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model
Kijong Han
Seojin Lee
Wooin Lee
Joosung Lee
Donghun Lee
AAML
25
5
0
10 Sep 2021
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Prasetya Ajie Utama
N. Moosavi
Victor Sanh
Iryna Gurevych
AAML
61
35
0
09 Sep 2021
Unsupervised Pre-training with Structured Knowledge for Improving
  Natural Language Inference
Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference
Xiaoyu Yang
Xiao-Dan Zhu
Zhan Shi
Tianda Li
SSL
11
1
0
08 Sep 2021
On Length Divergence Bias in Textual Matching Models
On Length Divergence Bias in Textual Matching Models
Lan Jiang
Tianshu Lyu
Yankai Lin
Chong Meng
Xiaoyong Lyu
Dawei Yin
14
4
0
06 Sep 2021
Causal Inference in Natural Language Processing: Estimation, Prediction,
  Interpretation and Beyond
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Amir Feder
Katherine A. Keith
Emaad A. Manzoor
Reid Pryzant
Dhanya Sridhar
...
Roi Reichart
Margaret E. Roberts
Brandon M Stewart
Victor Veitch
Diyi Yang
CML
41
234
0
02 Sep 2021
Accurate, yet inconsistent? Consistency Analysis on Language
  Understanding Models
Accurate, yet inconsistent? Consistency Analysis on Language Understanding Models
Myeongjun Jang
D. Kwon
Thomas Lukasiewicz
22
13
0
15 Aug 2021
Grounding Representation Similarity with Statistical Testing
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
22
30
0
03 Aug 2021
Is My Model Using The Right Evidence? Systematic Probes for Examining
  Evidence-Based Tabular Reasoning
Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning
Vivek Gupta
Riyaz Ahmad Bhat
Atreya Ghosal
Manisha Srivastava
M. Singh
Vivek Srikumar
LMTD
15
18
0
02 Aug 2021
Previous
12345
Next