Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1707.07328
Cited By
Adversarial Examples for Evaluating Reading Comprehension Systems
23 July 2017
Robin Jia
Abigail Z. Jacobs
AAML
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adversarial Examples for Evaluating Reading Comprehension Systems"
50 / 925 papers shown
Title
Analyzing and Mitigating Negation Artifacts using Data Augmentation for Improving ELECTRA-Small Model Accuracy
Mojtaba Noghabaei
60
0
0
09 Nov 2025
Cache Mechanism for Agent RAG Systems
Shuhang Lin
Zhencan Peng
Lingyao Li
Xiao Lin
Xi Zhu
Yongfeng Zhang
113
0
0
04 Nov 2025
FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models
Jia Deng
Jin Li
Zhenhua Zhao
Shaowei Wang
AAML
VLM
144
1
0
22 Oct 2025
CMT-Bench: Cricket Multi-Table Generation Benchmark for Probing Robustness in Large Language Models
Ritam Upadhyay
Naman Ahuja
Rishabh Baral
Aparna Garimella
Vivek Gupta
LMTD
138
0
0
20 Oct 2025
Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models
Elias Hossain
Swayamjit Saha
Somshubhra Roy
Ravi Prasad
150
2
0
20 Oct 2025
Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering
Nil-Jana Akpinar
Chia-Jung Lee
Vanessa Murdock
Pietro Perona
108
0
0
14 Oct 2025
Adversarial Robustness in One-Stage Learning-to-Defer
Yannis Montreuil
Letian Yu
Axel Carlier
Lai Xing Ng
Wei Tsang Ooi
AAML
98
1
0
13 Oct 2025
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Z. Chen
Yiming Zhang
Hengguang Zhou
Zenghui Ding
Yining Sun
Cho-Jui Hsieh
OffRL
ALM
ELM
87
0
0
12 Oct 2025
ConDABench: Interactive Evaluation of Language Models for Data Analysis
Avik Dutta
Priyanshu Gupta
Hosein Hasanbeig
Rahul Pratap Singh
Harshit Nigam
Sumit Gulwani
Arjun Radhakrishna
Gustavo Soares
A. Tiwari
LMTD
180
0
0
10 Oct 2025
Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
Milad Nasr
Yanick Fratantonio
Luca Invernizzi
Ange Albertini
Loua Farah
Alex Petit-Bianco
Seth Neel
Kurt Thomas
Elie Bursztein
Nicholas Carlini
AAML
112
1
0
02 Oct 2025
Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset
Doha Nam
Taehyoun Kim
Duksan Ryu
Jongmoon Baik
AAML
80
0
0
11 Sep 2025
MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages
Dan Saattrup Smart
RALM
337
1
0
04 Sep 2025
Can Out-of-Distribution Evaluations Uncover Reliance on Shortcuts? A Case Study in Question Answering
Michal Štefánik
Timothee Mickus
Marek Kadlcík
Michal Spiegel
Josef Kuchař
80
0
0
25 Aug 2025
SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
Wuxinlin Cheng
Yun Feng
Jinwen Wu
K. P. Subbalakshmi
Tian Han
Zhuo Feng
AAML
92
0
0
23 Aug 2025
How Causal Abstraction Underpins Computational Explanation
Atticus Geiger
Jacqueline Harding
Thomas Icard
117
2
0
15 Aug 2025
Special-Character Adversarial Attacks on Open-Source Language Model
Ephraiem Sarabamoun
84
1
0
12 Aug 2025
HeQ: a Large and Diverse Hebrew Reading Comprehension Benchmark
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Amir D. N. Cohen
Hilla Merhav
Yoav Goldberg
Reut Tsarfaty
92
11
0
03 Aug 2025
Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal
Yang Wang
Chenghao Xiao
Yi Zhou
Stuart E. Middleton
Noura Al Moubayed
C. D. Lin
AAML
299
1
0
29 Jul 2025
Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models
Altynbek Ismailov
Salia Asanova
KELM
79
0
0
15 Jul 2025
Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices
IEEE Internet of Things Journal (IEEE IoT J.), 2023
Lu Zhang
S. Lambotharan
G. Zheng
G. Liao
Basil AsSadhan
Fabio Roli
AAML
164
13
0
13 Jun 2025
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing
Hongjun Liu
Yilun Zhao
Arman Cohan
Chen Zhao
AAML
LRM
264
0
0
05 Jun 2025
Normative Conflicts and Shallow AI Alignment
Philosophical Studies (Philos. Stud.), 2025
Raphaël Millière
222
3
0
05 Jun 2025
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin
Sicheol Sung
Shinwoo Park
SeungYeop Baik
Yo-Sub Han
236
1
0
30 May 2025
Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Fardin Ahsan Sakib
Ziwei Zhu
Karen Trister Grace
Meliha Yetisgen
Özlem Uzuner
201
0
0
30 May 2025
Evaluating the Retrieval Robustness of Large Language Models
Shuyang Cao
Karthik Radhakrishnan
David S. Rosenberg
Steven Lu
Pengxiang Cheng
Lu Wang
Shiyue Zhang
RALM
183
2
0
28 May 2025
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning
Yongkang Liu
Xingle Xu
Ercong Nie
Zijing Wang
Shi Feng
Daling Wang
Qian Li
Hinrich Schutze
182
0
0
28 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
409
5
0
20 May 2025
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks
Narek Maloyan
Bislan Ashinov
Dmitry Namiot
AAML
ELM
210
7
0
19 May 2025
Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge
Luyu Chen
Zeyu Zhang
Haoran Tan
Quanyu Dai
Hao-ran Yang
Zhenhua Dong
Xu Chen
175
0
0
18 May 2025
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
Amirbek Djanibekov
Nurdaulet Mukhituly
Kentaro Inui
Hanan Aldarmaki
Nils Lukas
AAML
280
1
0
18 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Jing Huang
Junyi Tao
Thomas Icard
Diyi Yang
Christopher Potts
OODD
410
3
0
17 May 2025
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Yulia Otmakhova
Hung Thinh Truong
Rahmad Mahendra
Zenan Zhai
Rongxin Zhu
Daniel Beck
Jey Han Lau
ELM
461
0
0
24 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
619
1
0
21 Apr 2025
QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yudong Zhang
Ruobing Xie
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Yu Wang
AAML
215
3
0
15 Apr 2025
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
Peng Guo
Tianqi Chen
Ching Ying Lin
Ching Ying Lin
Jade Law
Mazen Jizzini
Jorge J. Nieva
Ruishan Liu
Robin Jia
297
1
0
15 Apr 2025
On the Robustness of GUI Grounding Models Against Image Attacks
Haoren Zhao
Tianyi Chen
Zhen Wang
AAML
281
6
0
07 Apr 2025
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD
Paul K. Mandal
AAML
153
0
0
24 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Computer Vision and Pattern Recognition (CVPR), 2025
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
304
27
0
04 Mar 2025
Shh, don't say that! Domain Certification in LLMs
International Conference on Learning Representations (ICLR), 2025
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Juil Sock
Adel Bibi
331
4
0
26 Feb 2025
MAGE: Multi-Head Attention Guided Embeddings for Low Resource Sentiment Classification
Varun Vashisht
Siyang Song
Mihir Konduskar
Jaskaran Singh Walia
Vukosi Marivate
168
1
0
25 Feb 2025
Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension
Yulong Wu
Viktor Schlegel
Riza Batista-Navarro
AAML
396
1
0
23 Feb 2025
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Jamshid Mozafari
Abdelrahman Abdallah
Bhawna Piryani
Adam Jatowt
299
0
0
22 Feb 2025
A Template Is All You Meme
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Luke Bates
Peter Ebert Christensen
Preslav Nakov
Iryna Gurevych
VLM
244
4
0
20 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
423
8
0
04 Feb 2025
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant
International Conference on Human Factors in Computing Systems (CHI), 2025
Gaole He
Gianluca Demartini
U. Gadiraju
LLMAG
449
25
0
03 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
531
5
0
02 Feb 2025
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
International Conference on Neural Information Processing (ICONIP), 2023
Qiming Bao
Gaël Gendron
A. Peng
Wanjun Zhong
N. Tan
Yang Chen
Michael Witbrock
Qingbin Liu
LRM
ELM
446
6
0
20 Jan 2025
Differentiable Adversarial Attacks for Marked Temporal Point Processes
AAAI Conference on Artificial Intelligence (AAAI), 2025
Pritish Chakraborty
Vinayak Gupta
R. Raj
Srikanta J. Bedathur
A. De
AAML
973
1
0
17 Jan 2025
On the uncertainty principle of neural networks
iScience (iScience), 2022
Jun-Jie Zhang
Dong-xiao Zhang
Jian-Nan Chen
L. Pang
Deyu Meng
418
6
0
17 Jan 2025
FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models
Zhuo Chen
Jiawei Liu
Miaokun Chen
Haotan Liu
Qikai Cheng
Qikai Cheng
Fan Zhang
Wei Lu
Jing Liu
AAML
348
1
0
06 Jan 2025
1
2
3
4
...
17
18
19
Next