ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.16888
  4. Cited By
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
v1v2v3 (latest)

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

North American Chapter of the Association for Computational Linguistics (NAACL), 2023
31 July 2023
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
    SILM
ArXiv (abs)PDFHTMLHuggingFace (7 upvotes)

Papers citing "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"

50 / 106 papers shown
Title
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
Y. Li
Z. Li
Wei Zhao
Nay Myat Min
Hanxun Huang
Xingjun Ma
Jun Sun
AAMLLLMAGSILM
358
0
0
20 Nov 2025
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Qiusi Zhan
Hyeonjeong Ha
Rui Yang
Sirui Xu
Hanyang Chen
Liang-Yan Gui
Yu Wang
Huan Zhang
Mengyue Yang
Daniel Kang
AAML
88
0
0
31 Oct 2025
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta
Shahriar Kabir Nahin
Anshuman Chhabra
P. Mohapatra
LLMAGLM&Ro
256
2
0
27 Oct 2025
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Giovanni De Muri
Mark Vero
Robin Staab
Martin Vechev
151
0
0
21 Oct 2025
Toward Understanding Security Issues in the Model Context Protocol Ecosystem
Toward Understanding Security Issues in the Model Context Protocol Ecosystem
Xiaofan Li
Xing Gao
140
1
0
18 Oct 2025
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Subhodip Panda
Dhruv Tarsadiya
S. Sourav
Prathosh A.P.
Sai Praneeth Karimireddy
TDI
190
0
0
12 Oct 2025
Automatic Text Box Placement for Supporting Typographic Design
Automatic Text Box Placement for Supporting Typographic Design
Jun Muraoka
Daichi Haraguchi
Naoto Inoue
Wataru Shimoda
Kota Yamaguchi
Seiichi Uchida
102
0
0
09 Oct 2025
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency
Mohamed Seif
Antti Koskela
H. Poor
Andrea J. Goldsmith
133
4
0
08 Oct 2025
Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
Yulin Chen
Haoran Li
Yuan Sui
Yangqiu Song
Bryan Hooi
SILMAAML
199
0
0
04 Oct 2025
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Rui Melo
Rui Abreu
C. Păsăreanu
134
0
0
01 Oct 2025
Backdoor Attacks Against Speech Language Models
Backdoor Attacks Against Speech Language Models
Alexandrine Fortier
Thomas Thebaud
Jesus Villalba
Najim Dehak
P. Cardinal
AuLLM
269
0
0
01 Oct 2025
GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners
GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners
Xue Yang
Yulin Chen
Jingru Zeng
Hao Peng
Huihao Jing
Wenbin Hu
Xi Yang
Ziqian Zeng
Sirui Han
Yangqiu Song
LRM
101
1
0
29 Sep 2025
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang
Qingqing Ye
Xuan Liu
Yanyun Wang
Jianliang Xu
Haibo Hu
201
1
0
27 Sep 2025
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Miao Yu
Zhenhong Zhou
Moayad Aloqaily
Kun Wang
Biwei Huang
S. Wang
Yueming Jin
Qingsong Wen
AAMLLLMSV
260
0
0
26 Sep 2025
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
Wence Ji
Jiancan Wu
Aiying Li
Shuyi Zhang
Junkang Wu
An Zhang
Xiang-Bin Wang
Xiangnan He
AAML
131
0
0
24 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang
Yongcan Yu
Jian Liang
Ran He
HILMLRM
197
4
0
04 Sep 2025
Backdoor Samples Detection Based on Perturbation Discrepancy Consistency in Pre-trained Language Models
Backdoor Samples Detection Based on Perturbation Discrepancy Consistency in Pre-trained Language ModelsNeural Networks (NN), 2025
Zuquan Peng
Jianming Fu
Lixin Zou
Li Zheng
Yanzhen Ren
Guojun Peng
AAML
100
0
0
30 Aug 2025
Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution
Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution
Chen Chen
Yuchen Sun
Jiaxin Gao
Xueluan Gong
Qian-Wei Wang
Ziyao Wang
Yongsen Zheng
K. Lam
AAMLKELM
148
0
0
28 Aug 2025
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol
Wei Ma
Y. Yang
Q. Hu
Shi Ying
Zhi Jin
...
Zhenchang Xing
Tianlin Li
Junjie Shi
Yang Liu
Linxiao Jiang
112
0
0
28 Aug 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
Md Abdullah Al Mamun
Ihsen Alouani
Nael B. Abu-Ghazaleh
94
0
0
28 Aug 2025
Pruning Strategies for Backdoor Defense in LLMs
Pruning Strategies for Backdoor Defense in LLMs
Santosh Chapagain
S. M. Hamdi
S. F. Boubrahimi
AAML
88
3
0
27 Aug 2025
An Investigation on Group Query Hallucination Attacks
An Investigation on Group Query Hallucination Attacks
Kehao Miao
Xiaolong Jin
AAMLLRM
80
0
0
26 Aug 2025
Multi-Target Backdoor Attacks Against Speaker Recognition
Multi-Target Backdoor Attacks Against Speaker Recognition
Alexandrine Fortier
Sonal Joshi
Thomas Thebaud
Jesus Villalba Lopez
Najim Dehak
P. Cardinal
AAML
256
1
0
12 Aug 2025
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models
J. Zhang
Shu Yang
Junchao Wu
Yang Li
Haiyan Zhao
200
1
0
04 Aug 2025
A Survey on Data Security in Large Language Models
A Survey on Data Security in Large Language Models
Kang Chen
Xiuze Zhou
Y. Lin
Jinhe Su
Yuanhui Yu
Li Shen
F. Lin
PILMELM
197
1
1
04 Aug 2025
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong
Aditi Raghunathan
195
3
0
31 Jul 2025
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
Biao Yi
Zekun Fei
Jianing Geng
Tong Li
Lihai Nie
Zheli Liu
Yiming Li
LRM
189
2
0
24 Jul 2025
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover
Matteo Lupinacci
Francesco Aurelio Pironti
Francesco Blefari
Francesco Romeo
Luigi Arena
Angelo Furfaro
LLMAGAAML
423
8
0
09 Jul 2025
Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
Ismail Labiad
Mathurin Videau
Matthieu Kowalski
Marc Schoenauer
Alessandro Leite
Julia Kempe
O. Teytaud
AAML
264
0
0
02 Jul 2025
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Houcheng Jiang
Zetong Zhao
Junfeng Fang
Haokai Ma
Ruipeng Wang
Yang Deng
Xiang Wang
Xiangnan He
KELMAAML
233
0
0
16 Jun 2025
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Brendan Leigh Ross
Noël Vouitsis
Atiyeh Ashari Ghomi
Rasa Hosseinzadeh
Ji Xin
...
Yi Sui
Shiyi Hou
Kin Kwan Leung
Gabriel Loaiza-Ganem
Jesse C. Cresswell
336
3
0
11 Jun 2025
Your Agent Can Defend Itself against Backdoor Attacks
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang
Liang Jiacheng
Cao Bochuan
Chen Jinghui
Wang Ting
AAMLLLMAG
329
4
0
10 Jun 2025
A Systematic Review of Poisoning Attacks Against Large Language Models
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley
Edward W. Staley
Joshua Carney
William Redman
Marie Chau
Nathan G. Drenkow
AAMLPILM
207
5
0
06 Jun 2025
XAI-Units: Benchmarking Explainability Methods with Unit Tests
XAI-Units: Benchmarking Explainability Methods with Unit TestsConference on Fairness, Accountability and Transparency (FAccT), 2025
Jun Rui Lee
Sadegh Emami
Michael David Hollins
Timothy C. H. Wong
Carlos Ignacio Villalobos Sánchez
Francesca Toni
Dekai Zhang
Adam Dejl
205
3
0
01 Jun 2025
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Jinwen Chen
Hainan Zhang
Fei Sun
Qinnan Zhang
Sijia Wen
Ziwei Wang
Zhiming Zheng
AAML
200
0
0
29 May 2025
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
Guanyu Hou
Jiaming He
Yinhang Zhou
Ji Guo
Yitong Qiao
Rui Zhang
Wenbo Jiang
AAML
249
1
0
26 May 2025
Security Concerns for Large Language Models: A Survey
Security Concerns for Large Language Models: A Survey
Miles Q. Li
Benjamin C. M. Fung
PILMELM
746
14
0
24 May 2025
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs
Jiawei Kong
Hao Fang
Xiaochen Yang
Kuofeng Gao
Bin Chen
Shu-Tao Xia
Yaowei Wang
Min Zhang
AAML
332
3
0
23 May 2025
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Zhexin Zhang
Yuhao Sun
Junxiao Yang
Shiyao Cui
Hongning Wang
Shiyu Huang
AAML
300
1
0
21 May 2025
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs
Jiawen Wang
Pritha Gupta
Ivan Habernal
Eyke Hüllermeier
SILMAAML
247
6
0
20 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu
Chao Wang
Tianming Liu
Yanjie Zhao
Haoyu Wang
AAML
433
9
0
19 May 2025
Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Ronghua Li
AAML
401
3
0
19 May 2025
A Survey of Attacks on Large Language Models
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAMLELM
220
7
0
18 May 2025
The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Rui Zhang
Yun Shen
Hongwei Li
Wenbo Jiang
Hanxiao Chen
Yuan Zhang
Guowen Xu
Yang Zhang
SILMAAML
224
0
0
16 May 2025
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
262
0
0
13 May 2025
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Liang Luo
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Guowen Xu
AAML
220
0
0
06 May 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILMAAMLELM
250
4
0
25 Apr 2025
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Information Leakage of Sentence Embeddings via Generative Embedding Inversion AttacksAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Antonios Tragoudaras
Theofanis Aslanidis
Emmanouil Georgios Lionis
Marina Orozco González
Panagiotis Eustratiadis
MIACVSILM
320
2
0
23 Apr 2025
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
246
2
0
15 Apr 2025
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Shide Zhou
Kaidi Wang
Ling Shi
Han Wang
254
1
0
01 Apr 2025
123
Next