ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.16888
  4. Cited By
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
v1v2v3 (latest)

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

North American Chapter of the Association for Computational Linguistics (NAACL), 2023
31 July 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
    SILM
ArXiv (abs)PDFHTMLHuggingFace (7 upvotes)

Papers citing "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"

50 / 106 papers shown
Title
Data Poisoning in Deep Learning: A Survey
Data Poisoning in Deep Learning: A Survey
Pinlong Zhao
Weiyao Zhu
Pengfei Jiao
Di Gao
Ou Wu
AAML
482
15
0
27 Mar 2025
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Michael-Andrei Panaitescu-Liess
Pankayaraj Pathmanathan
Yigitcan Kaya
Zora Che
Bang An
Sicheng Zhu
Aakriti Agrawal
Furong Huang
AAML
337
2
0
10 Mar 2025
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language ModelsIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Niccolò Turcato
Matteo Iovino
Aris Synodinos
Alberto Dalla Libera
R. Carli
Pietro Falco
LM&Ro
459
1
0
06 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2025
Terry Tong
Haiwei Yang
Zhe Zhao
Mengzhao Chen
AAMLELM
266
12
0
01 Mar 2025
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xianglong Liu
Yaning Tan
M. Han
Yong Luo
Aishan Liu
Xiantao Cai
Zheng He
Dacheng Tao
AAMLSILMELM
351
8
0
22 Feb 2025
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Robustness and Cybersecurity in the EU Artificial Intelligence ActConference on Fairness, Accountability and Transparency (FAccT), 2025
Henrik Nolte
Miriam Rateike
Michèle Finck
356
6
0
22 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAMLSILM
455
7
0
18 Feb 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
AAML
399
10
0
29 Jan 2025
Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions
Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future DirectionsPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024
Hao Du
Shang Liu
Lele Zheng
Yang Cao
Atsuyoshi Nakamura
Lei Chen
AAML
458
13
0
21 Dec 2024
Quantized Delta Weight Is Safety Keeper
Quantized Delta Weight Is Safety Keeper
Yule Liu
Zhen Sun
Xinlei He
Xinyi Huang
377
10
0
29 Nov 2024
Neutralizing Backdoors through Information Conflicts for Large Language
  Models
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELMAAML
360
3
0
27 Nov 2024
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-TuningIEEE Symposium on Security and Privacy (S&P), 2024
Zhen Sun
Tianshuo Cong
Yule Liu
Chenhao Lin
Xinlei He
Rongmao Chen
Xingshuo Han
Xinyi Huang
AAML
391
15
0
26 Nov 2024
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction
  Tuning
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction TuningNeural Information Processing Systems (NeurIPS), 2024
Hang Zhou
Yehui Tang
Haochen Qin
Yujie Yang
Renren Jin
Deyi Xiong
Kai Han
Yunhe Wang
291
12
0
21 Nov 2024
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
389
11
0
18 Nov 2024
A Survey on Adversarial Machine Learning for Code Data: Realistic
  Threats, Countermeasures, and Interpretations
A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations
Yulong Yang
Haoran Fan
Chenhao Lin
Qian Li
Subrat Kishore Dutta
Chao Shen
Xiaohong Guan
AAML
241
1
0
12 Nov 2024
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Haoyang Li
Xiaogeng Liu
SILM
420
19
0
30 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
555
5
0
22 Oct 2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILMAAML
284
2
0
15 Oct 2024
SplitLLM: Collaborative Inference of LLMs for Model Placement and
  Throughput Optimization
SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization
Akrit Mudvari
Yuang Jiang
Leandros Tassiulas
157
10
0
14 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Juil Sock
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
452
25
0
11 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and
  Challenges
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
251
11
0
30 Sep 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A
  Survey
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
428
76
0
26 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILMAAML
411
18
0
23 Sep 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
Data-centric NLP Backdoor Defense from the Lens of MemorizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhenting Wang
Zhizhi Wang
Haoyang Ling
Mengnan Du
Juan Zhai
Shiqing Ma
243
5
0
21 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILMAAML
339
8
0
05 Sep 2024
Rethinking Backdoor Detection Evaluation for Language Models
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan
Wenjie Jacky Mo
Xiang Ren
Robin Jia
ELM
301
4
0
31 Aug 2024
DefectTwin: When LLM Meets Digital Twin for Railway Defect Inspection
DefectTwin: When LLM Meets Digital Twin for Railway Defect Inspection
Rahatara Ferdousi
M. Anwar Hossain
Chunsheng Yang
Abdulmotaleb El Saddik
103
7
0
26 Aug 2024
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li
Hanxun Huang
Yunhan Zhao
Jiabo He
Jun Sun
AAMLSILM
328
19
0
23 Aug 2024
BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger
BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger
Yulin Chen
Haoran Li
Zihao Zheng
Zihao Zheng
Yangqiu Song
Bryan Hooi
437
10
0
17 Aug 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction
  Amplification
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAGAAML
259
52
0
30 Jul 2024
Can Editing LLMs Inject Harm?
Can Editing LLMs Inject Harm?
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
...
Xifeng Yan
William Wang
Juil Sock
Dawn Song
Kai Shu
KELM
346
22
0
29 Jul 2024
LocalValueBench: A Collaboratively Built and Extensible Benchmark for
  Evaluating Localized Value Alignment and Ethical Safety in Large Language
  Models
LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models
Achintya Gopal
Nicholas Wai Long Lau
Eva Adelina Susanto
Chi Lok Yu
Aditya Paul
ELM
238
11
0
27 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
428
40
0
20 Jul 2024
SOS! Soft Prompt Attack Against Open-Source Large Language Models
SOS! Soft Prompt Attack Against Open-Source Large Language Models
Ziqing Yang
Michael Backes
Yang Zhang
Ahmed Salem
AAML
181
9
0
03 Jul 2024
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
Yuetai Li
Zhangchen Xu
Fengqing Jiang
Luyao Niu
D. Sahabandu
Bhaskar Ramasubramanian
Radha Poovendran
SILMAAML
464
15
0
18 Jun 2024
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of
  Large Language Models
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
Jiaqi Xue
Meng Zheng
Yebowen Hu
Fei Liu
Xun Chen
Qian Lou
AAMLSILM
352
57
0
03 Jun 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Yue Liu
Dawn Song
Peter Henderson
Prateek Mittal
AAML
271
19
0
29 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
Amit K. Roy-Chowdhury
Chengyu Song
222
23
0
27 May 2024
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in
  Large Language Models
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Pengzhou Cheng
Yidong Ding
Tianjie Ju
Zongru Wu
Wei Du
Ping Yi
Zhuosheng Zhang
Gongshen Liu
SILMAAML
421
50
0
22 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
392
141
0
06 May 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
212
33
0
08 Apr 2024
Large language models in 6G security: challenges and opportunities
Large language models in 6G security: challenges and opportunities
Tri Nguyen
Huong Nguyen
Ahmad Ijaz
Saeid Sheikhi
Athanasios V. Vasilakos
Panos Kostakos
ELM
250
26
0
18 Mar 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
On Protecting the Data Privacy of Large Language Models (LLMs): A SurveyInternational Conference on Mathematics and Computing (ICMC), 2024
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILawPILM
384
155
0
08 Mar 2024
Automatic and Universal Prompt Injection Attacks against Large Language
  Models
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu
Zhiyuan Yu
Yizhe Zhang
Ning Zhang
Chaowei Xiao
SILMAAML
235
100
0
07 Mar 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the
  vulnerabilities of safety guardrails to harmful queries
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
460
29
0
23 Feb 2024
Learning to Poison Large Language Models for Downstream Manipulation
Learning to Poison Large Language Models for Downstream Manipulation
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Mohammad Amin Roshani
Prashant Khanduri
Douglas Zytko
Dongxiao Zhu
AAMLSILM
373
6
0
21 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for
  Parameter-Efficient Fine-Tuning
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Yaoyao Yu
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
352
30
0
19 Feb 2024
How Susceptible are Large Language Models to Ideological Manipulation?
How Susceptible are Large Language Models to Ideological Manipulation?
Kai Chen
Zihao He
Jun Yan
Taiwei Shi
Kristina Lerman
305
22
0
18 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based
  Agents
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAGAAML
272
117
0
17 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAMLLLMAG
267
23
0
15 Feb 2024
Previous
123
Next