ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00692
  4. Cited By
Stress Test Evaluation for Natural Language Inference

Stress Test Evaluation for Natural Language Inference

2 June 2018
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
    ELM
ArXivPDFHTML

Papers citing "Stress Test Evaluation for Natural Language Inference"

50 / 237 papers shown
Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
26
0
0
09 May 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers
Lili Zhao
Qi Liu
Wei-neng Chen
Lu Chen
R.-H. Sun
Min Hou
Yang Wang
Shijin Wang
28
0
0
14 Apr 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
52
0
0
03 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
48
0
0
31 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
38
2
0
14 Mar 2025
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
Johan R. Portela
Nicolás Perez
Rubén Manrique
44
0
0
11 Mar 2025
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs
Samir Abdaljalil
Hasan Kurban
Parichit Sharma
Erchin Serpedin
Rachad Atat
HILM
58
0
0
07 Mar 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
66
2
0
24 Feb 2025
Distributional Scaling Laws for Emergent Capabilities
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
39
0
0
24 Feb 2025
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Yulong Wu
Viktor Schlegel
R. Batista-Navarro
AAML
36
0
0
23 Feb 2025
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets
Daniel Petrov
28
0
0
05 Jan 2025
Improving the Natural Language Inference robustness to hard dataset by
  data augmentation and preprocessing
Improving the Natural Language Inference robustness to hard dataset by data augmentation and preprocessing
Zijiang Yang
68
1
0
10 Dec 2024
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating
  Robustness of AI-Generated Image detectors
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Anisha Pal
Julia Kruk
Mansi Phute
Manognya Bhattaram
Diyi Yang
Duen Horng Chau
Judy Hoffman
AAML
44
2
0
12 Nov 2024
ALVIN: Active Learning Via INterpolation
ALVIN: Active Learning Via INterpolation
Michalis Korakakis
Andreas Vlachos
Adrian Weller
28
0
0
11 Oct 2024
How Hard is this Test Set? NLI Characterization by Exploiting Training
  Dynamics
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics
Adrian Cosma
Stefan Ruseti
Mihai Dascălu
Cornelia Caragea
16
2
0
04 Oct 2024
What Is Wrong with My Model? Identifying Systematic Problems with
  Semantic Data Slicing
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
38
1
0
14 Sep 2024
Probing the Robustness of Vision-Language Pretrained Models: A
  Multimodal Adversarial Attack Approach
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
33
1
0
24 Aug 2024
Covert Bias: The Severity of Social Views' Unalignment in Language
  Models Towards Implicit and Explicit Opinion
Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion
Abeer Aldayel
Areej Alokaili
Rehab Alahmadi
30
0
0
15 Aug 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness
  Evaluation in Large Language Models
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
27
1
0
16 Jun 2024
The Factorization Curse: Which Tokens You Predict Underlie the Reversal
  Curse and More
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
O. Kitouni
Niklas Nolte
Diane Bouchacourt
Adina Williams
Mike Rabbat
Mark Ibrahim
LRM
CLL
48
12
0
07 Jun 2024
DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data
  Perturbations and MinMax Training
DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training
Bhuvanesh Verma
Lisa Raithel
19
1
0
01 May 2024
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language
  Models
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath
Vishwa Shah
Kshitish Ghate
30
0
0
22 Apr 2024
How often are errors in natural language reasoning due to paraphrastic
  variability?
How often are errors in natural language reasoning due to paraphrastic variability?
Neha Srikanth
Marine Carpuat
Rachel Rudinger
LRM
35
2
0
17 Apr 2024
Laying Anchors: Semantically Priming Numerals in Language Modeling
Laying Anchors: Semantically Priming Numerals in Language Modeling
Mandar Sharma
Rutuja Murlidhar Taware
Pravesh Koirala
Nikhil Muralidhar
Naren Ramakrishnan
31
2
0
02 Apr 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
58
0
0
13 Mar 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
  Models for PowerPoint Task Completion
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Zekai Zhang
Yiduo Guo
Yaobo Liang
Dongyan Zhao
Nan Duan
38
1
0
06 Mar 2024
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech
  Detection?
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin
Leo Wanner
A. Shvets
21
2
0
23 Feb 2024
RITFIS: Robust input testing framework for LLMs-based intelligent
  software
RITFIS: Robust input testing framework for LLMs-based intelligent software
Ming-Ming Xiao
Yan Xiao
Hai Dong
Shunhui Ji
Pengcheng Zhang
AAML
42
5
0
21 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
45
9
0
25 Jan 2024
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific
  Progress in NLP
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
13
1
0
01 Dec 2023
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
  Evolving LLM APIs
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs
Wanqin Ma
Chenyang Yang
Christian Kastner
19
20
0
18 Nov 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
  Hate Speech Detection Case Study
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
42
0
0
16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
31
3
0
16 Nov 2023
Measuring and Improving Attentiveness to Partial Inputs with
  Counterfactuals
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar
Bhargavi Paranjape
Hao Peng
Sarah Wiegreffe
Khyathi Raghavi
Vivek Srikumar
Sameer Singh
Noah A. Smith
AAML
OOD
31
0
0
16 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context
  Learning
Using Natural Language Explanations to Improve Robustness of In-context Learning
Xuanli He
Yuxiang Wu
Oana-Maria Camburu
Pasquale Minervini
Pontus Stenetorp
AAML
31
1
0
13 Nov 2023
Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models
Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models
Yiyuan Li
Rakesh R Menon
Sayan Ghosh
Shashank Srivastava
LRM
16
2
0
08 Nov 2023
Recursion in Recursion: Two-Level Nested Recursion for Length
  Generalization with Scalability
Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
Jishnu Ray Chowdhury
Cornelia Caragea
37
5
0
08 Nov 2023
BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy
  Text
BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text
Aarohi Srivastava
David Chiang
30
6
0
31 Oct 2023
REFER: An End-to-end Rationale Extraction Framework for Explanation
  Regularization
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization
Mohammad Reza Ghasemi Madani
Pasquale Minervini
32
4
0
22 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan S. Kankanhalli
AAML
SILM
25
68
0
20 Oct 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using
  LLMs
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs
Chenyang Yang
Rishabh Rustogi
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
KELM
32
11
0
14 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
33
36
0
07 Oct 2023
On the Relationship between Skill Neurons and Robustness in Prompt
  Tuning
On the Relationship between Skill Neurons and Robustness in Prompt Tuning
Leon Ackermann
Xenia Ohmer
AAML
21
0
0
21 Sep 2023
GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models'
  Over-Reliance on Superficial Clue
GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue
Yanrui Du
Sendong Zhao
Yuhan Chen
Rai Bai
Jing Liu
Huaqin Wu
Haifeng Wang
Bing Qin
42
2
0
08 Sep 2023
Robustness Over Time: Understanding Adversarial Examples' Effectiveness
  on Longitudinal Versions of Large Language Models
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
Yugeng Liu
Tianshuo Cong
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
41
6
0
15 Aug 2023
Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic
Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic
Terufumi Morishita
Gaku Morio
Atsuki Yamaguchi
Yasuhiro Sogawa
ReLM
LRM
AI4CE
ELM
27
22
0
11 Aug 2023
Efficient Beam Tree Recursion
Efficient Beam Tree Recursion
Jishnu Ray Chowdhury
Cornelia Caragea
32
3
0
20 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
30
5
0
11 Jul 2023
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
Lasha Abzianidze
J. Zwarts
Yoad Winter
19
2
0
05 Jul 2023
12345
Next