ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXivPDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown
Title
Evaluating Paraphrastic Robustness in Textual Entailment Models
Evaluating Paraphrastic Robustness in Textual Entailment Models
Dhruv Verma
Yash Kumar Lal
Shreyashee Sinha
Benjamin Van Durme
Adam Poliak
28
5
0
29 Jun 2023
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
Xinzhe Li
Ming Liu
Shang Gao
Wray L. Buntine
14
20
0
27 Jun 2023
HonestBait: Forward References for Attractive but Faithful Headline
  Generation
HonestBait: Forward References for Attractive but Faithful Headline Generation
Chih-Yao Chen
Dennis Wu
Lun-Wei Ku
14
2
0
26 Jun 2023
Limits for Learning with Language Models
Limits for Learning with Language Models
Nicholas M. Asher
Swarnadeep Bhar
Akshay Chaturvedi
Julie Hunter
Soumya Paul
19
22
0
21 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of
  Vision-and-Language Learning
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Shivaen Ramshetty
Gaurav Verma
Srijan Kumar
33
2
0
19 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models
  on Adversarial Prompts
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
...
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
36
144
0
07 Jun 2023
Can current NLI systems handle German word order? Investigating language
  model performance on a new German challenge set of minimal pairs
Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs
Ines Reinig
K. Markert
16
0
0
07 Jun 2023
Beam Tree Recursive Cells
Beam Tree Recursive Cells
Jishnu Ray Chowdhury
Cornelia Caragea
31
6
0
31 May 2023
Fighting Bias with Bias: Promoting Model Robustness by Amplifying
  Dataset Biases
Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases
Yuval Reif
Roy Schwartz
28
7
0
30 May 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a
  Unified Automatic Robustness Evaluation Framework
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
27
6
0
29 May 2023
On Degrees of Freedom in Defining and Testing Natural Language
  Understanding
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Saku Sugawara
S. Tsugita
ELM
28
1
0
24 May 2023
Adversarial Demonstration Attacks on Large Language Models
Adversarial Demonstration Attacks on Large Language Models
Jiong Wang
Zi-yang Liu
Keun Hee Park
Zhuojun Jiang
Zhaoheng Zheng
Zhuofeng Wu
Muhao Chen
Chaowei Xiao
SILM
22
52
0
24 May 2023
Out-of-Distribution Generalization in Text Classification: Past,
  Present, and Future
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
37
2
0
23 May 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via
  Debate
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
Boshi Wang
Xiang Yue
Huan Sun
ELM
LRM
46
60
0
22 May 2023
Should We Attend More or Less? Modulating Attention for Fairness
Should We Attend More or Less? Modulating Attention for Fairness
A. Zayed
Gonçalo Mordido
Samira Shabanian
Sarath Chandar
37
10
0
22 May 2023
Distilling Robustness into Natural Language Inference Models with
  Domain-Targeted Augmentation
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation
Joe Stacey
Marek Rei
22
2
0
22 May 2023
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization
Ting Wu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
41
2
0
20 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common
  Assumptions?
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?
Neeraj Varshney
Mihir Parmar
Nisarg Patel
Divij Handa
Sayantan Sarkar
Man Luo
Chitta Baral
LRM
34
4
0
20 May 2023
Measuring Consistency in Text-based Financial Forecasting Models
Measuring Consistency in Text-based Financial Forecasting Models
Linyi Yang
Yingpeng Ma
Yue Zhang
28
4
0
15 May 2023
Learning Non-linguistic Skills without Sacrificing Linguistic
  Proficiency
Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency
Mandar Sharma
Nikhil Muralidhar
Naren Ramakrishnan
CLL
35
4
0
14 May 2023
CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing
CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing
Songyang Gao
Shihan Dou
Junjie Shan
Qi Zhang
Xuanjing Huang
CML
13
0
0
04 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
35
15
0
12 Apr 2023
Large Language Model Instruction Following: A Survey of Progresses and
  Challenges
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
29
20
0
18 Mar 2023
A Mixed-Methods Approach to Understanding User Trust after Voice
  Assistant Failures
A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures
Amanda Baughan
Allison Mercurio
Ariel Liu
Xuezhi Wang
Jilin Chen
Xiao Ma
22
15
0
01 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
47
2
0
28 Feb 2023
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
  Language Model: An Empirical Study on Codex
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
Terry Yue Zhuo
Zhuang Li
Yujin Huang
Fatemeh Shiri
Weiqing Wang
Gholamreza Haffari
Yuan-Fang Li
AAML
26
53
0
30 Jan 2023
Analyzing Semantic Faithfulness of Language Models via Input
  Intervention on Question Answering
Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering
Akshay Chaturvedi
Swarnadeep Bhar
Soumadeep Saha
Utpal Garain
Nicholas Asher
33
4
0
21 Dec 2022
DISCO: Distilling Counterfactuals with Large Language Models
DISCO: Distilling Counterfactuals with Large Language Models
Zeming Chen
Qiyue Gao
Antoine Bosselut
Ashish Sabharwal
Kyle Richardson
29
25
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
40
44
0
20 Dec 2022
Feature-Level Debiased Natural Language Understanding
Feature-Level Debiased Natural Language Understanding
Yougang Lyu
Piji Li
Yechang Yang
Maarten de Rijke
Pengjie Ren
Yukun Zhao
Dawei Yin
Z. Ren
32
10
0
11 Dec 2022
AGRO: Adversarial Discovery of Error-prone groups for Robust
  Optimization
AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization
Bhargavi Paranjape
Pradeep Dasigi
Vivek Srikumar
Luke Zettlemoyer
Hannaneh Hajishirzi
36
7
0
02 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating
  Shortcut Learning
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen
Yeshuang Zhu
Jinchao Zhang
Jie Zhou
Minlie Huang
CML
AAML
22
8
0
29 Nov 2022
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of
  Modulated Cross-Entropy in Natural Language Inference
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of Modulated Cross-Entropy in Natural Language Inference
Frano Rajic
Ivan Stresec
Axel Marmet
Tim Postuvan
24
3
0
23 Nov 2022
Deep Learning on a Healthy Data Diet: Finding Important Examples for
  Fairness
Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness
A. Zayed
Prasanna Parthasarathi
Gonçalo Mordido
Hamid Palangi
Samira Shabanian
Sarath Chandar
26
21
0
20 Nov 2022
Capabilities for Better ML Engineering
Capabilities for Better ML Engineering
Chenyang Yang
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
24
3
0
11 Nov 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in
  Natural Language Inference
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference
S. Rajaee
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
36
5
0
07 Nov 2022
Probing neural language models for understanding of words of estimative
  probability
Probing neural language models for understanding of words of estimative probability
Damien Sileo
Marie-Francine Moens
19
10
0
07 Nov 2022
Learning to Infer from Unlabeled Data: A Semi-supervised Learning
  Approach for Robust Natural Language Inference
Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference
Mobashir Sadat
Cornelia Caragea
18
2
0
05 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
  Dilutions
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions
Gaurav Verma
Vishwa Vinay
Ryan A. Rossi
Srijan Kumar
VLM
AAML
11
8
0
04 Nov 2022
Overcoming Barriers to Skill Injection in Language Modeling: Case Study
  in Arithmetic
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Mandar Sharma
Nikhil Muralidhar
Naren Ramakrishnan
21
6
0
03 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
  Negation
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Abhilasha Ravichander
Matt Gardner
Ana Marasović
33
34
0
01 Nov 2022
Leveraging Affirmative Interpretations from Negation Improves Natural
  Language Understanding
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain
Eduardo Blanco
27
4
0
26 Oct 2022
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
D. K. Santhosh Kumar
Vivek Gupta
Soumya Sharma
Shuo Zhang
LMTD
21
3
0
23 Oct 2022
Lexical Generalization Improves with Larger Models and Longer Training
Lexical Generalization Improves with Larger Models and Longer Training
Elron Bandel
Yoav Goldberg
Yanai Elazar
52
6
0
23 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training
Enhancing Tabular Reasoning with Pattern Exploiting Training
Abhilash Shankarampeta
Vivek Gupta
Shuo Zhang
LMTD
RALM
ReLM
62
6
0
21 Oct 2022
Measures of Information Reflect Memorization Patterns
Measures of Information Reflect Memorization Patterns
Rachit Bansal
Danish Pruthi
Yonatan Belinkov
30
8
0
17 Oct 2022
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
A Survey of Parameters Associated with the Quality of Benchmarks in NLP
Swaroop Mishra
Anjana Arunkumar
Chris Bryan
Chitta Baral
31
1
0
14 Oct 2022
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
  Embedding
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
17
8
0
14 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
24
8
0
13 Oct 2022
GULP: a prediction-based metric between representations
GULP: a prediction-based metric between representations
Enric Boix Adserà
Hannah Lawrence
George Stepaniants
Philippe Rigollet
46
11
0
12 Oct 2022
Previous
12345
Next