Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.11932
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
27 July 2019
Di Jin
Zhijing Jin
Qiufeng Wang
Peter Szolovits
SILM
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Github (511★)
Papers citing
"Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment"
50 / 567 papers shown
Title
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
Wenlong Meng
Shuguo Fan
Chengkun Wei
Min Chen
Yuwei Li
Yuanchao Zhang
Zhikun Zhang
Wenzhi Chen
7
0
0
09 Jun 2025
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
Tzu-Ling Lin
Wei Chen
Teng-Fang Hsiao
Hou-I Liu
Ya-Hsin Yeh
Yu Kai Chan
Wen-Sheng Lien
Po-Yen Kuo
Philip S. Yu
Hong-Han Shuai
AAML
17
0
0
08 Jun 2025
Coordinated Robustness Evaluation Framework for Vision-Language Models
Ashwin Ramesh Babu
Sajad Mousavi
Vineet Gundecha
Sahand Ghorbanpour
Avisek Naug
Antonio Guillen
Ricardo Luna Gutierrez
Soumyendu Sarkar
AAML
25
0
0
05 Jun 2025
Urban Visibility Hotspots: Quantifying Building Vertex Visibility from Connected Vehicle Trajectories using Spatial Indexing
Artur Grigorev
Adriana-Simona Mihaita
28
0
0
03 Jun 2025
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu
Faisal Hamman
Sanghamitra Dutta
ALM
58
0
0
02 Jun 2025
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin
Sicheol Sung
Shinwoo Park
SeungYeop Baik
Yo-Sub Han
15
0
0
30 May 2025
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Rui Cai
Bangzheng Li
Xiaofei Wen
Muhao Chen
Zhe Zhao
11
0
0
26 May 2025
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Zixuan Chen
Hao Lin
Ke Xu
Xinghao Jiang
Tanfeng Sun
39
0
0
25 May 2025
What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
Binh Nguyen
Shuji Shi
Ryan Ofman
Thai Le
AAML
190
0
0
23 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
125
0
0
20 May 2025
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Foutse Khomh
Heng Li
101
0
0
20 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
109
0
0
20 May 2025
TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis
Longtian Wang
Xiaofei Xie
Tianlin Li
Yuhan Zhi
Chao Shen
60
0
0
11 May 2025
IM-BERT: Enhancing Robustness of BERT through the Implicit Euler Method
Mihyeon Kim
Juhyoung Park
Youngbin Kim
210
0
0
11 May 2025
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
170
0
0
06 May 2025
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
Mazal Bethany
Nishant Vishwamitra
Cho-Yu Chiang
Peyman Najafirad
AAML
56
0
0
03 May 2025
MatMMFuse: Multi-Modal Fusion model for Material Property Prediction
Abhiroop Bhattacharya
Sylvain G. Cloutier
AI4CE
55
0
0
30 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
328
0
0
21 Apr 2025
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na
YunSeok Choi
Jee-Hyong Lee
AAML
71
0
0
18 Apr 2025
Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails
William Hackett
Lewis Birch
Stefan Trawicki
N. Suri
Peter Garraghan
66
4
0
15 Apr 2025
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
Qingbin Liu
Zhaoxin Wang
Handing Wang
Cong Tian
Yaochu Jin
54
1
0
15 Apr 2025
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers
Lili Zhao
Qi Liu
Wei-neng Chen
Lu Chen
R.-H. Sun
Min Hou
Yang Wang
Shijin Wang
140
0
0
14 Apr 2025
CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent
Liang-bo Ning
Shijie Wang
Wenqi Fan
Qing Li
Xin Xu
Hao Chen
Feiran Huang
AAML
107
21
0
13 Apr 2025
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks
Xiaomei Zhang
Zhaoxi Zhang
Yanjun Zhang
Xufei Zheng
L. Zhang
Shengshan Hu
Shirui Pan
AAML
58
0
0
08 Apr 2025
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Ziyi Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
169
2
0
08 Apr 2025
On the Connection Between Diffusion Models and Molecular Dynamics
Liam Harcombe
Timothy T. Duignan
DiffM
107
0
0
04 Apr 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
123
2
0
03 Apr 2025
DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents
Shiyi Yang
Zhibo Hu
Xinshu Li
Chen Wang
Tong Yu
Xiwei Xu
Liming Zhu
Lina Yao
AAML
103
0
0
31 Mar 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
87
0
0
31 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
106
3
0
25 Mar 2025
ZeroLM: Data-Free Transformer Architecture Search for Language Models
Zhen-Song Chen
Hong-Wei Ding
Xian-Jia Wang
Witold Pedrycz
94
0
0
24 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
90
3
0
14 Mar 2025
Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Diego Gosmar
Deborah A. Dahl
Dario Gosmar
AAML
85
1
0
14 Mar 2025
Reasoning-Grounded Natural Language Explanations for Language Models
Vojtech Cahlik
Rodrigo Alves
Pavel Kordík
LRM
93
2
0
14 Mar 2025
TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
Jingyi Zheng
Junfeng Wang
Zhen Sun
Wenhan Dong
Yule Liu
Xinlei He
AAML
103
0
0
10 Mar 2025
CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation
Runqi Sui
AAML
83
1
0
10 Mar 2025
Conceptual Contrastive Edits in Textual and Vision-Language Retrieval
Maria Lymperaiou
Giorgos Stamou
VLM
80
0
0
01 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
Volkan Cevher
AAML
107
0
0
24 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jing Liu
89
1
0
23 Feb 2025
SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
Yue Gao
Ilia Shumailov
Kassem Fawaz
AAML
219
0
0
21 Feb 2025
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
Matteo Saponati
Pascal Sager
Pau Vilimelis Aceituno
Thilo Stadelmann
Benjamin Grewe
7
1
0
15 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
262
0
0
07 Feb 2025
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen
Yixin Wu
Y. Qu
Michael Backes
Savvas Zannettou
Yang Zhang
115
7
0
28 Jan 2025
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
Yang Wang
Chenghua Lin
ELM
193
0
0
05 Jan 2025
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Martin Pawelczyk
Lillian Sun
Zhenting Qi
Aounon Kumar
Himabindu Lakkaraju
153
2
0
03 Jan 2025
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
Hao Zhang
Wenqi Shao
Hong Liu
Yongqiang Ma
Ping Luo
Yu Qiao
Kaipeng Zhang
Kai Zhang
VLM
AAML
42
10
0
31 Dec 2024
On Adversarial Robustness of Language Models in Transfer Learning
Bohdan Turbal
Anastasiia Mazur
Jiaxu Zhao
Mykola Pechenizkiy
AAML
102
0
0
29 Dec 2024
Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction
Xuan Feng
T. Gu
Xiaoli Liu
L. Chang
91
1
0
23 Dec 2024
Adversarial Robustness through Dynamic Ensemble Learning
Hetvi Waghela
Jaydip Sen
Sneha Rakshit
AAML
129
0
0
20 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Aman Chadha
Manas Gaur
AAML
224
1
0
20 Dec 2024
1
2
3
4
...
10
11
12
Next