Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.16888
Cited By
v1
v2
v3 (latest)
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
31 July 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (7 upvotes)
Papers citing
"Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
50 / 106 papers shown
Title
Data Poisoning in Deep Learning: A Survey
Pinlong Zhao
Weiyao Zhu
Pengfei Jiao
Di Gao
Ou Wu
AAML
482
15
0
27 Mar 2025
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Michael-Andrei Panaitescu-Liess
Pankayaraj Pathmanathan
Yigitcan Kaya
Zora Che
Bang An
Sicheng Zhu
Aakriti Agrawal
Furong Huang
AAML
337
2
0
10 Mar 2025
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Niccolò Turcato
Matteo Iovino
Aris Synodinos
Alberto Dalla Libera
R. Carli
Pietro Falco
LM&Ro
459
1
0
06 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
International Conference on Learning Representations (ICLR), 2025
Terry Tong
Haiwei Yang
Zhe Zhao
Mengzhao Chen
AAML
ELM
266
12
0
01 Mar 2025
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xianglong Liu
Yaning Tan
M. Han
Yong Luo
Aishan Liu
Xiantao Cai
Zheng He
Dacheng Tao
AAML
SILM
ELM
351
8
0
22 Feb 2025
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Henrik Nolte
Miriam Rateike
Michèle Finck
356
6
0
22 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
455
7
0
18 Feb 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
AAML
399
10
0
29 Jan 2025
Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024
Hao Du
Shang Liu
Lele Zheng
Yang Cao
Atsuyoshi Nakamura
Lei Chen
AAML
458
13
0
21 Dec 2024
Quantized Delta Weight Is Safety Keeper
Yule Liu
Zhen Sun
Xinlei He
Xinyi Huang
377
10
0
29 Nov 2024
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
360
3
0
27 Nov 2024
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
IEEE Symposium on Security and Privacy (S&P), 2024
Zhen Sun
Tianshuo Cong
Yule Liu
Chenhao Lin
Xinlei He
Rongmao Chen
Xingshuo Han
Xinyi Huang
AAML
391
15
0
26 Nov 2024
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2024
Hang Zhou
Yehui Tang
Haochen Qin
Yujie Yang
Renren Jin
Deyi Xiong
Kai Han
Yunhe Wang
291
12
0
21 Nov 2024
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
389
11
0
18 Nov 2024
A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations
Yulong Yang
Haoran Fan
Chenhao Lin
Qian Li
Subrat Kishore Dutta
Chao Shen
Xiaohong Guan
AAML
241
1
0
12 Nov 2024
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Haoyang Li
Xiaogeng Liu
SILM
420
19
0
30 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
555
5
0
22 Oct 2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILM
AAML
284
2
0
15 Oct 2024
SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization
Akrit Mudvari
Yuang Jiang
Leandros Tassiulas
157
10
0
14 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Juil Sock
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
452
25
0
11 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
251
11
0
30 Sep 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
428
76
0
26 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
411
18
0
23 Sep 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhenting Wang
Zhizhi Wang
Haoyang Ling
Mengnan Du
Juan Zhai
Shiqing Ma
243
5
0
21 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
339
8
0
05 Sep 2024
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan
Wenjie Jacky Mo
Xiang Ren
Robin Jia
ELM
301
4
0
31 Aug 2024
DefectTwin: When LLM Meets Digital Twin for Railway Defect Inspection
Rahatara Ferdousi
M. Anwar Hossain
Chunsheng Yang
Abdulmotaleb El Saddik
103
7
0
26 Aug 2024
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li
Hanxun Huang
Yunhan Zhao
Jiabo He
Jun Sun
AAML
SILM
328
19
0
23 Aug 2024
BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger
Yulin Chen
Haoran Li
Zihao Zheng
Zihao Zheng
Yangqiu Song
Bryan Hooi
437
10
0
17 Aug 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
259
52
0
30 Jul 2024
Can Editing LLMs Inject Harm?
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
...
Xifeng Yan
William Wang
Juil Sock
Dawn Song
Kai Shu
KELM
346
22
0
29 Jul 2024
LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models
Achintya Gopal
Nicholas Wai Long Lau
Eva Adelina Susanto
Chi Lok Yu
Aditya Paul
ELM
238
11
0
27 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
428
40
0
20 Jul 2024
SOS! Soft Prompt Attack Against Open-Source Large Language Models
Ziqing Yang
Michael Backes
Yang Zhang
Ahmed Salem
AAML
181
9
0
03 Jul 2024
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
Yuetai Li
Zhangchen Xu
Fengqing Jiang
Luyao Niu
D. Sahabandu
Bhaskar Ramasubramanian
Radha Poovendran
SILM
AAML
464
15
0
18 Jun 2024
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
Jiaqi Xue
Meng Zheng
Yebowen Hu
Fei Liu
Xun Chen
Qian Lou
AAML
SILM
352
57
0
03 Jun 2024
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Yue Liu
Dawn Song
Peter Henderson
Prateek Mittal
AAML
271
19
0
29 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
Amit K. Roy-Chowdhury
Chengyu Song
222
23
0
27 May 2024
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Pengzhou Cheng
Yidong Ding
Tianjie Ju
Zongru Wu
Wei Du
Ping Yi
Zhuosheng Zhang
Gongshen Liu
SILM
AAML
421
50
0
22 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
392
141
0
06 May 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
212
33
0
08 Apr 2024
Large language models in 6G security: challenges and opportunities
Tri Nguyen
Huong Nguyen
Ahmad Ijaz
Saeid Sheikhi
Athanasios V. Vasilakos
Panos Kostakos
ELM
250
26
0
18 Mar 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
International Conference on Mathematics and Computing (ICMC), 2024
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILaw
PILM
384
155
0
08 Mar 2024
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu
Zhiyuan Yu
Yizhe Zhang
Ning Zhang
Chaowei Xiao
SILM
AAML
235
100
0
07 Mar 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
460
29
0
23 Feb 2024
Learning to Poison Large Language Models for Downstream Manipulation
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Mohammad Amin Roshani
Prashant Khanduri
Douglas Zytko
Dongxiao Zhu
AAML
SILM
373
6
0
21 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Yaoyao Yu
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
352
30
0
19 Feb 2024
How Susceptible are Large Language Models to Ideological Manipulation?
Kai Chen
Zihao He
Jun Yan
Taiwei Shi
Kristina Lerman
305
22
0
18 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
272
117
0
17 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
267
23
0
15 Feb 2024
Previous
1
2
3
Next