Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.00312
Cited By
Trojaning Language Models for Fun and Profit
1 August 2020
Xinyang Zhang
Zheng-Wei Zhang
Shouling Ji
Ting Wang
SILM
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trojaning Language Models for Fun and Profit"
28 / 28 papers shown
Title
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
52
0
0
02 May 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
59
0
0
24 Apr 2025
A Practical Memory Injection Attack against LLM Agents
Shen Dong
Shaocheng Xu
Pengfei He
Y. Li
Jiliang Tang
Tianming Liu
Hui Liu
Zhen Xiang
LLMAG
AAML
43
2
0
05 Mar 2025
Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model Hubs
Jian Zhao
Shenao Wang
Yanjie Zhao
Xinyi Hou
Kailong Wang
Peiming Gao
Yuanchao Zhang
Chen Wei
Haoyu Wang
31
10
0
14 Sep 2024
Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models
Zhenyang Ni
Rui Ye
Yuxian Wei
Zhen Xiang
Yanfeng Wang
Siheng Chen
AAML
36
9
0
19 Apr 2024
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Xuan Sheng
Zhicheng Li
Zhaoyang Han
Xiangmao Chang
Piji Li
35
3
0
26 Dec 2023
Efficient Trigger Word Insertion
Yueqi Zeng
Ziqiang Li
Pengfei Xia
Lei Liu
Bin Li
AAML
21
5
0
23 Nov 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
30
42
0
28 Aug 2023
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
34
21
0
24 May 2023
Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective
Baoyuan Wu
Zihao Zhu
Li Liu
Qingshan Liu
Zhaofeng He
Siwei Lyu
AAML
44
21
0
19 Feb 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
31
20
0
14 Feb 2023
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing
Jiali Wei
Ming Fan
Wenjing Jiao
Wuxia Jin
Ting Liu
AAML
29
10
0
25 Jan 2023
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models
H. Aghakhani
Wei Dai
Andre Manoel
Xavier Fernandes
Anant Kharkar
Christopher Kruegel
Giovanni Vigna
David E. Evans
B. Zorn
Robert Sim
SILM
21
33
0
06 Jan 2023
Detecting Backdoors in Deep Text Classifiers
Youyan Guo
Jun Wang
Trevor Cohn
SILM
30
1
0
11 Oct 2022
Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy
Wenqiang Ruan
Ming Xu
Wenjing Fang
Li Wang
Lei Wang
Wei Han
32
12
0
18 Aug 2022
DECK: Model Hardening for Defending Pervasive Backdoors
Guanhong Tao
Yingqi Liu
Shuyang Cheng
Shengwei An
Zhuo Zhang
Qiuling Xu
Guangyu Shen
Xiangyu Zhang
AAML
20
7
0
18 Jun 2022
Watermarking Pre-trained Encoders in Contrastive Learning
Yutong Wu
Han Qiu
Tianwei Zhang
L. Jiwei
M. Qiu
23
9
0
20 Jan 2022
Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures
Eugene Bagdasaryan
Vitaly Shmatikov
SILM
AAML
22
75
0
09 Dec 2021
Backdoor Pre-trained Models Can Transfer to All
Lujia Shen
S. Ji
Xuhong Zhang
Jinfeng Li
Jing Chen
Jie Shi
Chengfang Fang
Jianwei Yin
Ting Wang
AAML
SILM
31
117
0
30 Oct 2021
BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models
Kangjie Chen
Yuxian Meng
Xiaofei Sun
Shangwei Guo
Tianwei Zhang
Jiwei Li
Chun Fan
SILM
23
105
0
06 Oct 2021
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jinyuan Jia
Yupei Liu
Neil Zhenqiang Gong
SILM
SSL
24
151
0
01 Aug 2021
EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry
Yingqi Liu
Guangyu Shen
Guanhong Tao
Zhenting Wang
Shiqing Ma
X. Zhang
AAML
22
8
0
16 Mar 2021
TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors
Ren Pang
Zheng-Wei Zhang
Xiangshan Gao
Zhaohan Xi
S. Ji
Peng Cheng
Xiapu Luo
Ting Wang
AAML
27
31
0
16 Dec 2020
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
Fanchao Qi
Yangyi Chen
Mukai Li
Yuan Yao
Zhiyuan Liu
Maosong Sun
AAML
28
261
0
20 Nov 2020
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
183
290
0
03 Sep 2019
Model-Reuse Attacks on Deep Learning Systems
Yujie Ji
Xinyang Zhang
S. Ji
Xiapu Luo
Ting Wang
SILM
AAML
134
186
0
02 Dec 2018
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
914
0
21 Apr 2018
Adversarial Machine Learning at Scale
Alexey Kurakin
Ian Goodfellow
Samy Bengio
AAML
261
3,109
0
04 Nov 2016
1