Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.13888
Cited By
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
31 August 2021
Linyang Li
Demin Song
Xiaonan Li
Jiehang Zeng
Ruotian Ma
Xipeng Qiu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning"
50 / 88 papers shown
Title
Threat Modeling for AI: The Case for an Asset-Centric Approach
Jose Sanchez Vicarte
Marcin Spoczynski
Mostafa Elsaid
14
0
0
08 May 2025
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
88
0
0
04 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
50
0
0
02 May 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
54
0
0
24 Apr 2025
NLP Security and Ethics, in the Wild
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
42
0
0
09 Apr 2025
Large Language Models Can Verbatim Reproduce Long Malicious Sequences
Sharon Lin
Krishnamurthy
Dvijotham
Jamie Hayes
Chongyang Shi
Ilia Shumailov
Shuang Song
AAML
33
0
0
21 Mar 2025
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal
Sailesh Panda
Mayank Singh
33
0
0
24 Feb 2025
MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
Shojiro Yamabe
Tsubasa Takahashi
Futa Waseda
Koki Wataoka
MoMe
81
1
0
21 Feb 2025
Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution
Yao Tong
Weijun Li
Xuanli He
Haolan Zhan
Qiongkai Xu
AAML
28
1
0
31 Dec 2024
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
80
1
0
17 Dec 2024
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
67
0
0
27 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
41
1
0
17 Nov 2024
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Yige Li
Hanxun Huang
Jiaming Zhang
Xingjun Ma
Yu-Gang Jiang
AAML
28
2
0
25 Oct 2024
Advancing NLP Security by Leveraging LLMs as Adversarial Engines
Sudarshan Srinivasan
Maria Mahbub
Amir Sadovnik
AAML
24
0
0
23 Oct 2024
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models
Cody Clop
Yannick Teglia
AAML
SILM
RALM
40
2
0
18 Oct 2024
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs
Lu Yan
Siyuan Cheng
Xuan Chen
Kaiyuan Zhang
Guangyu Shen
Zhuo Zhang
Xiangyu Zhang
AAML
SILM
18
0
0
05 Oct 2024
Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
Jaehan Kim
Minkyoo Song
S. Na
Seungwon Shin
AAML
33
0
0
21 Sep 2024
MEGen: Generative Backdoor in Large Language Models via Model Editing
Jiyang Qiu
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
AAML
KELM
SILM
23
3
0
20 Aug 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Qingcheng Zeng
Mingyu Jin
Qinkai Yu
Zhenting Wang
Wenyue Hua
...
Felix Juefei Xu
Kaize Ding
Fan Yang
Ruixiang Tang
Yongfeng Zhang
AAML
31
10
0
15 Jul 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
37
11
0
10 Jun 2024
PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning
Tianrong Zhang
Zhaohan Xi
Ting Wang
Prasenjit Mitra
Jinghui Chen
AAML
SILM
27
2
0
06 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
27
7
0
01 Jun 2024
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou Nie
Yanting Wang
Jinyuan Jia
Michael J. De Lucia
Nathaniel D. Bastian
Wenbo Guo
Dawn Song
SILM
AAML
34
5
0
27 May 2024
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Pengzhou Cheng
Yidong Ding
Tianjie Ju
Zongru Wu
Wei Du
Ping Yi
Zhuosheng Zhang
Gongshen Liu
SILM
AAML
27
19
0
22 May 2024
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks
Xuanli He
Qiongkai Xu
Jun Wang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
29
4
0
19 May 2024
Backdoor Attack on Multilingual Machine Translation
Jun Wang
Qiongkai Xu
Xuanli He
Benjamin I. P. Rubinstein
Trevor Cohn
22
5
0
03 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
27
8
0
02 Apr 2024
Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning
Xiaopeng Xie
Ming Yan
Xiwen Zhou
Chenlong Zhao
Suli Wang
Yong Zhang
Joey Tianyi Zhou
AAML
30
0
0
30 Mar 2024
BadEdit: Backdooring large language models by model editing
Yanzhou Li
Tianlin Li
Kangjie Chen
Jian Zhang
Shangqing Liu
Wenhan Wang
Tianwei Zhang
Yang Liu
SyDa
AAML
KELM
51
50
0
20 Mar 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILaw
PILM
70
76
0
08 Mar 2024
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora
Xuanli He
Maximilian Mozes
Srinibas Swain
Mark Dras
Qiongkai Xu
SILM
MoMe
AAML
54
12
0
29 Feb 2024
Learning to Poison Large Language Models During Instruction Tuning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Mohammad Amin Roshani
Douglas Zytko
Dongxiao Zhu
AAML
SILM
32
20
0
21 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Leilei Gan
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
21
22
0
19 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
22
52
0
17 Feb 2024
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers
Lei Xu
Sarah Alnegheimish
Laure Berti-Equille
Alfredo Cuesta-Infante
K. Veeramachaneni
AAML
12
0
0
30 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
17
101
0
30 Jan 2024
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang
Fengqing Jiang
Zidi Xiong
Bhaskar Ramasubramanian
Radha Poovendran
Bo Li
LRM
SILM
24
38
0
20 Jan 2024
Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
Shuai Zhao
Meihuizi Jia
Anh Tuan Luu
Fengjun Pan
Jinming Wen
AAML
29
35
0
11 Jan 2024
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Xuan Sheng
Zhicheng Li
Zhaoyang Han
Xiangmao Chang
Piji Li
23
3
0
26 Dec 2023
Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics
Xiaoxing Mo
Yechao Zhang
Leo Yu Zhang
Wei Luo
Nan Sun
Shengshan Hu
Shang Gao
Yang Xiang
AAML
17
14
0
05 Dec 2023
Exploring the Robustness of Decentralized Training for Large Language Models
Lin Lu
Chenxi Dai
Wangcheng Tao
Binhang Yuan
Yanan Sun
Pan Zhou
20
1
0
01 Dec 2023
Unveiling Backdoor Risks Brought by Foundation Models in Heterogeneous Federated Learning
Xi Li
Chen Henry Wu
Jiaqi Wang
AAML
51
5
0
30 Nov 2023
TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4
Zihao Tan
Qingliang Chen
Yongjian Huang
Chen Liang
SILM
AAML
29
3
0
29 Nov 2023
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Hengzhi Pei
Jinyuan Jia
Wenbo Guo
Bo-wen Li
Dawn Song
SILM
14
9
0
19 Nov 2023
Backdoor Threats from Compromised Foundation Models to Federated Learning
Xi Li
Songhe Wang
Chen Henry Wu
Hao Zhou
Jiaqi Wang
87
10
0
31 Oct 2023
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Weimin Lyu
Songzhu Zheng
Lu Pang
Haibin Ling
Chao Chen
8
34
0
23 Oct 2023
Watermarking LLMs with Weight Quantization
Linyang Li
Botian Jiang
Pengyu Wang
Ke Ren
Hang Yan
Xipeng Qiu
MQ
WaLM
6
11
0
17 Oct 2023
SeqXGPT: Sentence-Level AI-Generated Text Detection
Pengyu Wang
Linyang Li
Ke Ren
Botian Jiang
Dong Zhang
Xipeng Qiu
DeLMO
21
49
0
13 Oct 2023
PETA: Parameter-Efficient Trojan Attacks
Lauren Hong
Ting Wang
AAML
28
1
0
01 Oct 2023
1
2
Next