Stress Test Evaluation for Natural Language Inference

2 June 2018

Aakanksha Naik

Abhilasha Ravichander

Graham Neubig

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown

Title
Evaluating Paraphrastic Robustness in Textual Entailment Models Dhruv Verma Yash Kumar Lal Shreyashee Sinha Benjamin Van Durme Adam Poliak 28 5 0 29 Jun 2023
A Survey on Out-of-Distribution Evaluation of Neural NLP Models Xinzhe Li Ming Liu Shang Gao Wray L. Buntine 14 20 0 27 Jun 2023
HonestBait: Forward References for Attractive but Faithful Headline Generation Chih-Yao Chen Dennis Wu Lun-Wei Ku 14 2 0 26 Jun 2023
Limits for Learning with Language Models Nicholas M. Asher Swarnadeep Bhar Akshay Chaturvedi Julie Hunter Soumya Paul 19 22 0 21 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning Shivaen Ramshetty Gaurav Verma Srijan Kumar 33 2 0 19 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts Kaijie Zhu Jindong Wang Jiaheng Zhou Zichen Wang Hao Chen ... Linyi Yang Weirong Ye Yue Zhang Neil Zhenqiang Gong Xingxu Xie SILM 36 144 0 07 Jun 2023
Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs Ines Reinig K. Markert 16 0 0 07 Jun 2023
Beam Tree Recursive Cells Jishnu Ray Chowdhury Cornelia Caragea 31 6 0 31 May 2023
Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases Yuval Reif Roy Schwartz 28 7 0 30 May 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework Yangyi Chen Hongcheng Gao Ganqu Cui Lifan Yuan Dehan Kong ... Longtao Huang H. Xue Zhiyuan Liu Maosong Sun Heng Ji AAML ELM 27 6 0 29 May 2023
On Degrees of Freedom in Defining and Testing Natural Language Understanding Saku Sugawara S. Tsugita ELM 28 1 0 24 May 2023
Adversarial Demonstration Attacks on Large Language Models Jiong Wang Zi-yang Liu Keun Hee Park Zhuojun Jiang Zhaoheng Zheng Zhuofeng Wu Muhao Chen Chaowei Xiao SILM 22 52 0 24 May 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future Linyi Yang Yangqiu Song Xuan Ren Chenyang Lyu Yidong Wang Lingqiao Liu Jindong Wang Jennifer Foster Yue Zhang OOD 37 2 0 23 May 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate Boshi Wang Xiang Yue Huan Sun ELM LRM 46 60 0 22 May 2023
Should We Attend More or Less? Modulating Attention for Fairness A. Zayed Gonçalo Mordido Samira Shabanian Sarath Chandar 37 10 0 22 May 2023
Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation Joe Stacey Marek Rei 22 2 0 22 May 2023
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization Ting Wu Rui Zheng Tao Gui Qi Zhang Xuanjing Huang 41 2 0 20 May 2023
Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions? Neeraj Varshney Mihir Parmar Nisarg Patel Divij Handa Sayantan Sarkar Man Luo Chitta Baral LRM 34 4 0 20 May 2023
Measuring Consistency in Text-based Financial Forecasting Models Linyi Yang Yingpeng Ma Yue Zhang 28 4 0 15 May 2023
Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency Mandar Sharma Nikhil Muralidhar Naren Ramakrishnan CLL 35 4 0 14 May 2023
CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing Songyang Gao Shihan Dou Junjie Shan Qi Zhang Xuanjing Huang CML 13 0 0 04 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements Samantha Robertson Zijie J. Wang Dominik Moritz Mary Beth Kery Fred Hohman 35 15 0 12 Apr 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges Renze Lou Kai Zhang Wenpeng Yin ALM LRM 29 20 0 18 Mar 2023
A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures Amanda Baughan Allison Mercurio Ariel Liu Xuezhi Wang Jilin Chen Xiao Ma 22 15 0 01 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases Yanchen Liu Jing Yang Yan Chen Jing Liu Huaqin Wu MoE 47 2 0 28 Feb 2023
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex Terry Yue Zhuo Zhuang Li Yujin Huang Fatemeh Shiri Weiqing Wang Gholamreza Haffari Yuan-Fang Li AAML 26 53 0 30 Jan 2023
Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering Akshay Chaturvedi Swarnadeep Bhar Soumadeep Saha Utpal Garain Nicholas Asher 33 4 0 21 Dec 2022
DISCO: Distilling Counterfactuals with Large Language Models Zeming Chen Qiyue Gao Antoine Bosselut Ashish Sabharwal Kyle Richardson 29 25 0 20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation Tianxing He Jingyu Zhang Tianle Wang Sachin Kumar Kyunghyun Cho James R. Glass Yulia Tsvetkov 40 44 0 20 Dec 2022
Feature-Level Debiased Natural Language Understanding Yougang Lyu Piji Li Yechang Yang Maarten de Rijke Pengjie Ren Yukun Zhao Dawei Yin Z. Ren 32 10 0 11 Dec 2022
AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization Bhargavi Paranjape Pradeep Dasigi Vivek Srikumar Luke Zettlemoyer Hannaneh Hajishirzi 36 7 0 02 Dec 2022
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning Jiaxin Wen Yeshuang Zhu Jinchao Zhang Jie Zhou Minlie Huang CML AAML 22 8 0 29 Nov 2022
Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of Modulated Cross-Entropy in Natural Language Inference Frano Rajic Ivan Stresec Axel Marmet Tim Postuvan 24 3 0 23 Nov 2022
Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness A. Zayed Prasanna Parthasarathi Gonçalo Mordido Hamid Palangi Samira Shabanian Sarath Chandar 26 21 0 20 Nov 2022
Capabilities for Better ML Engineering Chenyang Yang Rachel A. Brower-Sinning Grace A. Lewis Christian Kastner Tongshuang Wu 24 3 0 11 Nov 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference S. Rajaee Yadollah Yaghoobzadeh Mohammad Taher Pilehvar 36 5 0 07 Nov 2022
Probing neural language models for understanding of words of estimative probability Damien Sileo Marie-Francine Moens 19 10 0 07 Nov 2022
Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference Mobashir Sadat Cornelia Caragea 18 2 0 05 Nov 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions Gaurav Verma Vishwa Vinay Ryan A. Rossi Srijan Kumar VLM AAML 11 8 0 04 Nov 2022
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic Mandar Sharma Nikhil Muralidhar Naren Ramakrishnan 21 6 0 03 Nov 2022
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation Abhilasha Ravichander Matt Gardner Ana Marasović 33 34 0 01 Nov 2022
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding Md Mosharaf Hossain Eduardo Blanco 27 4 0 26 Oct 2022
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning D. K. Santhosh Kumar Vivek Gupta Soumya Sharma Shuo Zhang LMTD 21 3 0 23 Oct 2022
Lexical Generalization Improves with Larger Models and Longer Training Elron Bandel Yoav Goldberg Yanai Elazar 52 6 0 23 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training Abhilash Shankarampeta Vivek Gupta Shuo Zhang LMTD RALM ReLM 62 6 0 21 Oct 2022
Measures of Information Reflect Memorization Patterns Rachit Bansal Danish Pruthi Yonatan Belinkov 30 8 0 17 Oct 2022
A Survey of Parameters Associated with the Quality of Benchmarks in NLP Swaroop Mishra Anjana Arunkumar Chris Bryan Chitta Baral 31 1 0 14 Oct 2022
Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding Songyang Gao Shihan Dou Qi Zhang Xuanjing Huang 17 8 0 14 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits Ameya Godbole Robin Jia ALM 24 8 0 13 Oct 2022
GULP: a prediction-based metric between representations Enric Boix Adserà Hannah Lawrence George Stepaniants Philippe Rigollet 46 11 0 12 Oct 2022