Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.11932
Cited By
v1
v2
v3
v4
v5
v6 (latest)
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
27 July 2019
Di Jin
Zhijing Jin
Qiufeng Wang
Peter Szolovits
SILM
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Github (511★)
Papers citing
"Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment"
50 / 567 papers shown
Title
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang
Jinghan Jia
Xin Chen
Aochuan Chen
Yihua Zhang
Jiancheng Liu
Ke Ding
Sijia Liu
DiffM
177
101
0
18 Oct 2023
BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries
Wenjie Lv
Zhen Wang
Yitao Zheng
Zhehua Zhong
Qi Xuan
Tianyi Chen
AAML
83
1
0
14 Oct 2023
Effects of Human Adversarial and Affable Samples on BERT Generalization
Aparna Elangovan
Jiayuan He
Yuan Li
Karin Verspoor
69
3
0
12 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
117
279
0
10 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
89
44
0
07 Oct 2023
Fooling the Textual Fooler via Randomizing Latent Representations
Duy C. Hoang
Quang H. Nguyen
Saurav Manchanda
MinLong Peng
Kok-Seng Wong
Khoa D. Doan
SILM
AAML
64
0
0
02 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
154
54
0
29 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
93
23
0
28 Sep 2023
SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution
Zhongjie Ba
Jieming Zhong
Jiachen Lei
Pengyu Cheng
Qinglong Wang
Zhan Qin
Peng Kuang
Kui Ren
73
22
0
25 Sep 2023
On the Relationship between Skill Neurons and Robustness in Prompt Tuning
Leon Ackermann
Xenia Ohmer
AAML
42
0
0
21 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
109
24
0
20 Sep 2023
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
Shakila Mahjabin Tonni
Mark Dras
TDI
AAML
GAN
60
0
0
19 Sep 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
86
152
0
18 Sep 2023
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study
Andreas Waldis
Yufang Hou
Iryna Gurevych
62
4
0
15 Sep 2023
MathAttack: Attacking Large Language Models Towards Math Solving Ability
Zihao Zhou
Qiufeng Wang
Mingyu Jin
Jie Yao
Jianan Ye
Wei Liu
Wei Wang
Xiaowei Huang
Kaizhu Huang
AAML
115
29
0
04 Sep 2023
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Raz Lapid
Ron Langberg
Moshe Sipper
AAML
135
112
0
04 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
106
469
0
02 Sep 2023
A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation
Sahar Sadrizadeh
Ljiljana Dolamic
P. Frossard
AAML
SILM
80
2
0
29 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
87
87
0
24 Aug 2023
LEAP: Efficient and Automated Test Method for NLP Software
Ming-Ming Xiao
Yan Xiao
Hai Dong
Shunhui Ji
Pengcheng Zhang
AAML
62
8
0
22 Aug 2023
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Wenxuan Wang
Jingyuan Huang
Jen-tse Huang
Chang Chen
Jiazhen Gu
Pinjia He
Michael R. Lyu
VLM
56
6
0
18 Aug 2023
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
Yugeng Liu
Tianshuo Cong
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
90
8
0
15 Aug 2023
Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading
Evan Crothers
H. Viktor
Nathalie Japkowicz
AAML
68
1
0
13 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Zhenpeng Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
163
302
0
07 Aug 2023
Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Waiman Si
Michael Backes
Yang Zhang
41
6
0
07 Aug 2023
LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack
HaiXiang Zhu
Zhaoqing Yang
Weiwei Shang
Yuren Wu
AAML
FAtt
80
3
0
01 Aug 2023
Adversarially Robust Neural Legal Judgement Systems
R. Raj
V. Devi
AILaw
ELM
AAML
38
0
0
31 Jul 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
74
3
0
31 Jul 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
Xinyu Zhang
Hanbin Hong
Yuan Hong
Peng Huang
Binghui Wang
Zhongjie Ba
Kui Ren
SILM
129
25
0
31 Jul 2023
When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-
k
k
k
Multi-Label Learning
Yuchen Sun
Qianqian Xu
Zitai Wang
Qingming Huang
AAML
107
1
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
118
76
0
26 Jul 2023
Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation
Neel Bhandari
Pin-Yu Chen
AAML
SILM
84
3
0
24 Jul 2023
Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models
Yimu Wang
Peng Shi
Hongyang Zhang
SILM
23
3
0
24 Jul 2023
NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic
Zióu Zheng
Xiao-Dan Zhu
AAML
LRM
86
6
0
06 Jul 2023
SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification
J. Wu
Dit-Yan Yeung
SILM
72
0
0
04 Jul 2023
Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)
Bushra Sabir
Muhammad Ali Babar
Sharif Abuadbba
SILM
74
10
0
03 Jul 2023
MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning
Zhehua Zhong
Tianyi Chen
Zhen Wang
AAML
37
3
0
27 Jun 2023
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
Xinzhe Li
Ming Liu
Shang Gao
Wray Buntine
74
20
0
27 Jun 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
Jin Ma
Yingchun Shan
AAML
55
3
0
27 Jun 2023
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
Songyang Gao
Shihan Dou
Yan Liu
Xiao Wang
Qi Zhang
Zhongyu Wei
Jin Ma
Yingchun Shan
OOD
60
4
0
27 Jun 2023
Towards Understanding What Code Language Models Learned
Toufique Ahmed
Dian Yu
Chen Huang
Cathy Wang
Prem Devanbu
Kenji Sagae
ELM
77
5
0
20 Jun 2023
Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications
Saed Rezayi
Zheng Liu
Zihao Wu
Chandra Dhakal
Bao Ge
...
Gengchen Mai
Ninghao Liu
Chen Zhen
Tianming Liu
Sheng Li
73
33
0
20 Jun 2023
Investigating Masking-based Data Generation in Language Models
Edward Ma
61
0
0
16 Jun 2023
A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models
Sahar Sadrizadeh
C. Barbier
Ljiljana Dolamic
P. Frossard
AAML
32
0
0
14 Jun 2023
Can Large Language Models Infer Causation from Correlation?
Zhijing Jin
Jiarui Liu
Zhiheng Lyu
Spencer Poff
Mrinmaya Sachan
Rada Mihalcea
Mona T. Diab
Bernhard Schölkopf
LRM
80
129
0
09 Jun 2023
Enhancing Robustness of AI Offensive Code Generators via Data Augmentation
Cristina Improta
Pietro Liguori
R. Natella
B. Cukic
Domenico Cotroneo
AAML
86
4
0
08 Jun 2023
PromptAttack: Probing Dialogue State Trackers with Adversarial Prompts
Xiangjue Dong
Yun He
Ziwei Zhu
James Caverlee
AAML
57
7
0
07 Jun 2023
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son
Seira Hidano
Kazuhide Fukushima
S. Kiyomoto
Isao Echizen
57
0
0
02 Jun 2023
AMR4NLI: Interpretable and robust NLI measures from semantic graphs
Juri Opitz
Shira Wein
Julius Steen
Anette Frank
Nathan Schneider
80
0
0
01 Jun 2023
Measuring the Robustness of NLP Models to Domain Shifts
Nitay Calderon
Naveh Porat
Eyal Ben-David
Alexander Chapanin
Zorik Gekhman
Nadav Oved
Vitaly Shalumov
Roi Reichart
121
8
0
31 May 2023
Previous
1
2
3
4
5
6
...
10
11
12
Next