Stress Test Evaluation for Natural Language Inference

2 June 2018

Aakanksha Naik

Abhilasha Ravichander

Graham Neubig

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown

Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification Leon Eshuijs Shihan Wang Antske Fokkens 26 0 0 09 May 2025
aiXamine: Simplified LLM Safety and Security Fatih Deniz Dorde Popovic Yazan Boshmaf Euisuh Jeong M. Ahmad Sanjay Chawla Issa M. Khalil ELM 80 0 0 21 Apr 2025
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers Lili Zhao Qi Liu Wei-neng Chen Lu Chen R.-H. Sun Min Hou Yang Wang Shijin Wang 28 0 0 14 Apr 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Aryan Agrawal Lisa Alazraki Shahin Honarvar Marek Rei 52 0 0 03 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining Qiang Wang Dawei Feng Xu Zhang Ao Shen Yang Xu Bo Ding H. Wang AAML 48 0 0 31 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Zhaofeng Wu Michihiro Yasunaga Andrew Cohen Yoon Kim Asli Celikyilmaz Marjan Ghazvininejad 38 2 0 14 Mar 2025
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships Johan R. Portela Nicolás Perez Rubén Manrique 44 0 0 11 Mar 2025
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs Samir Abdaljalil Hasan Kurban Parichit Sharma Erchin Serpedin Rachad Atat HILM 58 0 0 07 Mar 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation Yue Zhou Yi-Ju Chang Yuan Wu MoMe 66 2 0 24 Feb 2025
Distributional Scaling Laws for Emergent Capabilities Rosie Zhao Tian Qin David Alvarez-Melis Sham Kakade Naomi Saphra LRM 39 0 0 24 Feb 2025
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension Yulong Wu Viktor Schlegel R. Batista-Navarro AAML 36 0 0 23 Feb 2025
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets Daniel Petrov 28 0 0 05 Jan 2025
Improving the Natural Language Inference robustness to hard dataset by data augmentation and preprocessing Zijiang Yang 68 1 0 10 Dec 2024
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors Anisha Pal Julia Kruk Mansi Phute Manognya Bhattaram Diyi Yang Duen Horng Chau Judy Hoffman AAML 44 2 0 12 Nov 2024
ALVIN: Active Learning Via INterpolation Michalis Korakakis Andreas Vlachos Adrian Weller 28 0 0 11 Oct 2024
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics Adrian Cosma Stefan Ruseti Mihai Dascălu Cornelia Caragea 16 2 0 04 Oct 2024
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing Chenyang Yang Yining Hong Grace A. Lewis Tongshuang Wu Christian Kastner 38 1 0 14 Sep 2024
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach Jiwei Guan Tianyu Ding Longbing Cao Lei Pan Chen Wang Xi Zheng AAML 33 1 0 24 Aug 2024
Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion Abeer Aldayel Areej Alokaili Rehab Alahmadi 30 0 0 15 Aug 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models Yuqing Wang Yun Zhao LRM AAML ELM 27 1 0 16 Jun 2024
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More O. Kitouni Niklas Nolte Diane Bouchacourt Adina Williams Mike Rabbat Mark Ibrahim LRM CLL 48 12 0 07 Jun 2024
DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training Bhuvanesh Verma Lisa Raithel 19 1 0 01 May 2024
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models Vishruth Veerendranath Vishwa Shah Kshitish Ghate 30 0 0 22 Apr 2024
How often are errors in natural language reasoning due to paraphrastic variability? Neha Srikanth Marine Carpuat Rachel Rudinger LRM 35 2 0 17 Apr 2024
Laying Anchors: Semantically Priming Numerals in Language Modeling Mandar Sharma Rutuja Murlidhar Taware Pravesh Koirala Nikhil Muralidhar Naren Ramakrishnan 31 2 0 02 Apr 2024
Specification Overfitting in Artificial Intelligence Benjamin Roth Pedro Henrique Luz de Araujo Yuxi Xia Saskia Kaltenbrunner Christoph Korab 58 0 0 13 Mar 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion Zekai Zhang Yiduo Guo Yaobo Liang Dongyan Zhao Nan Duan 38 1 0 06 Mar 2024
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? Yiping Jin Leo Wanner A. Shvets 21 2 0 23 Feb 2024
RITFIS: Robust input testing framework for LLMs-based intelligent software Ming-Ming Xiao Yan Xiao Hai Dong Shunhui Ji Pengcheng Zhang AAML 42 5 0 21 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models Erik Arakelyan Zhaoqi Liu Isabelle Augenstein AAML 45 9 0 25 Jan 2024
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP Julian Michael 13 1 0 01 Dec 2023
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs Wanqin Ma Chenyang Yang Christian Kastner 19 20 0 18 Nov 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study Maike Zufle Verna Dankers Ivan Titov 42 0 0 16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness Ashim Gupta Rishanth Rajendhran Nathan Stringham Vivek Srikumar Ana Marasović AAML 31 3 0 16 Nov 2023
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals Yanai Elazar Bhargavi Paranjape Hao Peng Sarah Wiegreffe Khyathi Raghavi Vivek Srikumar Sameer Singh Noah A. Smith AAML OOD 31 0 0 16 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context Learning Xuanli He Yuxiang Wu Oana-Maria Camburu Pasquale Minervini Pontus Stenetorp AAML 31 1 0 13 Nov 2023
Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models Yiyuan Li Rakesh R Menon Sayan Ghosh Shashank Srivastava LRM 16 2 0 08 Nov 2023
Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability Jishnu Ray Chowdhury Cornelia Caragea 37 5 0 08 Nov 2023
BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text Aarohi Srivastava David Chiang 30 6 0 31 Oct 2023
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization Mohammad Reza Ghasemi Madani Pasquale Minervini 32 4 0 22 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack Xilie Xu Keyi Kong Ning Liu Li-zhen Cui Di Wang Jingfeng Zhang Mohan S. Kankanhalli AAML SILM 25 68 0 20 Oct 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs Chenyang Yang Rishabh Rustogi Rachel A. Brower-Sinning Grace A. Lewis Christian Kastner Tongshuang Wu KELM 32 11 0 14 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin Muchao Ye Tianrong Zhang Tianyu Du Jinguo Zhu Han Liu Jinghui Chen Ting Wang Fenglong Ma AAML VLM CoGe 33 36 0 07 Oct 2023
On the Relationship between Skill Neurons and Robustness in Prompt Tuning Leon Ackermann Xenia Ohmer AAML 21 0 0 21 Sep 2023
GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue Yanrui Du Sendong Zhao Yuhan Chen Rai Bai Jing Liu Huaqin Wu Haifeng Wang Bing Qin 42 2 0 08 Sep 2023
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models Yugeng Liu Tianshuo Cong Zhengyu Zhao Michael Backes Yun Shen Yang Zhang AAML 41 6 0 15 Aug 2023
Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic Terufumi Morishita Gaku Morio Atsuki Yamaguchi Yasuhiro Sogawa ReLM LRM AI4CE ELM 27 22 0 11 Aug 2023
Efficient Beam Tree Recursion Jishnu Ray Chowdhury Cornelia Caragea 32 3 0 20 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features Ester Hlavnova Sebastian Ruder 30 5 0 11 Jul 2023
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space Lasha Abzianidze J. Zwarts Yoad Winter 19 2 0 05 Jul 2023