Adversarial Examples for Evaluating Reading Comprehension Systems

23 July 2017

Robin Jia

Papers citing "Adversarial Examples for Evaluating Reading Comprehension Systems"

50 / 926 papers shown

Analyzing and Mitigating Negation Artifacts using Data Augmentation for Improving ELECTRA-Small Model Accuracy

Mojtaba Noghabaei

09 Nov 2025

Cache Mechanism for Agent RAG Systems

130

04 Nov 2025

FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models

156

22 Oct 2025

CMT-Bench: Cricket Multi-Table Generation Benchmark for Probing Robustness in Large Language Models

162

20 Oct 2025

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

177

20 Oct 2025

Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering

115

14 Oct 2025

Adversarial Robustness in One-Stage Learning-to-Defer

112

13 Oct 2025

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

114

12 Oct 2025

ConDABench: Interactive Evaluation of Language Models for Data Analysis

197

10 Oct 2025

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

131

02 Oct 2025

Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset

11 Sep 2025

MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages

Dan Saattrup Smart

RALM

345

04 Sep 2025

Can Out-of-Distribution Evaluations Uncover Reliance on Shortcuts? A Case Study in Question Answering

104

25 Aug 2025

SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds

110

23 Aug 2025

How Causal Abstraction Underpins Computational Explanation

Atticus Geiger

Jacqueline Harding

Thomas Icard

145

15 Aug 2025

Special-Character Adversarial Attacks on Open-Source Language Model

Ephraiem Sarabamoun

128

12 Aug 2025

HeQ: a Large and Diverse Hebrew Reading Comprehension BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

117

03 Aug 2025

Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal

304

29 Jul 2025

Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models

Altynbek Ismailov

Salia Asanova

KELM

117

15 Jul 2025

Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT DevicesIEEE Internet of Things Journal (IEEE IoT J.), 2023

180

13 Jun 2025

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

275

05 Jun 2025

Normative Conflicts and Shallow AI AlignmentPhilosophical Studies (Philos. Stud.), 2025

Raphaël Millière

251

05 Jun 2025

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

259

30 May 2025

Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

216

30 May 2025

Evaluating the Retrieval Robustness of Large Language Models

Shuyang Cao

Karthik Radhakrishnan

201

28 May 2025

Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning

192

28 May 2025

YESciEval: Robust LLM-as-a-Judge for Scientific Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

461

20 May 2025

Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks

241

19 May 2025

SPIRIT: Patching Speech Language Models against Jailbreak Attacks

297

18 May 2025

Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge

213

18 May 2025

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

446

17 May 2025

FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

486

24 Apr 2025

aiXamine: Simplified LLM Safety and Security

648

21 Apr 2025

QAVA: Query-Agnostic Visual Attack to Large Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

243

15 Apr 2025

Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions

381

15 Apr 2025

On the Robustness of GUI Grounding Models Against Image Attacks

289

07 Apr 2025

When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD

Paul K. Mandal

AAML

195

24 Mar 2025

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?Computer Vision and Pattern Recognition (CVPR), 2025

344

04 Mar 2025

Shh, don't say that! Domain Certification in LLMsInternational Conference on Learning Representations (ICLR), 2025

348

26 Feb 2025

MAGE: Multi-Head Attention Guided Embeddings for Low Resource Sentiment Classification

199

25 Feb 2025

Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

417

23 Feb 2025

Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility ScoresAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

322

22 Feb 2025

A Template Is All You MemeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Luke Bates

Peter Ebert Christensen

Preslav Nakov

Iryna Gurevych

VLM

268

20 Feb 2025

ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation

147

12 Feb 2025

Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Daniel Tamayo

Aitor Gonzalez-Agirre

Javier Hernando

Marta Villegas

KELM

457

04 Feb 2025

Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily AssistantInternational Conference on Human Factors in Computing Systems (CHI), 2025

497

03 Feb 2025

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

556

02 Feb 2025

Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical ReasoningInternational Conference on Neural Information Processing (ICONIP), 2023

478

20 Jan 2025

Differentiable Adversarial Attacks for Marked Temporal Point ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2025

1.0K

17 Jan 2025

On the uncertainty principle of neural networksiScience (iScience), 2022

464

17 Jan 2025