v1v2v3 (latest)

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

19 October 2023

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (205★)

Papers citing "Formalizing and Benchmarking Prompt Injection Attacks and Defenses"

50 / 58 papers shown

Title
Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation Baolei Zhang Haoran Xin Yuxi Chen Zhuqing Liu Biao Yi Tong Li Lihai Nie Zheli Liu Minghong Fang SILM 32 0 0 17 Sep 2025
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks S M Asif Hossain Ruksat Khan Shayoni Mohd Ruhul Ameen Akif Islam M. F. Mridha Jungpil Shin LLMAG SILM AAML 32 0 0 16 Sep 2025
Free-MAD: Consensus-Free Multi-Agent Debate Yu Cui Hang Fu Haibin Zhang Licheng Wang Cong Zuo LRM 0 0 0 14 Sep 2025
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems Vitor Hugo Galhardo Moia Igor Jochem Sanz Gabriel Antonio Fontes Rebello Rodrigo Duarte de Meneses Briland Hitaj Ulf Lindqvist 4 0 0 12 Sep 2025
On the Security of Tool-Invocation Prompts for LLM-Based Agentic Systems: An Empirical Risk Assessment Yuchong Xie Mingyu Luo Zesen Liu Z. Zhang Kaikai Zhang Yu Liu Zongjie Li Ping Chen Shuai Wang Dongdong She LLMAG 0 0 0 06 Sep 2025
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models Yanbo Wang Yongcan Yu Jian Liang Ran He HILM LRM 21 0 0 04 Sep 2025
PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance Mengxiao Wang Yuxuan Zhang Guofei Gu AAML SILM 32 0 0 28 Aug 2025
Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning Yanbo Dai Zhenlan Ji Zongjie Li Kuan Li Shuai Wang SILM AAML KELM 41 0 0 27 Aug 2025
SoK: Large Language Model Copyright Auditing via Fingerprinting Shuo Shao Yiming Li Yexiao He Hongwei Yao Wenyuan Yang D. Tao Zhan Qin 44 0 0 27 Aug 2025
UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation Runpeng Geng Yanting Wang Ying Chen Jinyuan Jia AAML 28 0 0 26 Aug 2025
Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior Zhuotao Lian Weiyu Wang Qingkui Zeng Toru Nakanishi Teruaki Kitasuka Chunhua Su SILM 40 0 0 25 Aug 2025
When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation Dario Pasquini Evgenios M. Kornaropoulos G. Ateniese Omer Akgul Athanasios Theocharis Petros Efstathopoulos AAML 26 0 0 08 Aug 2025
AttnTrace: Attention-based Context Traceback for Long-Context LLMs Yanting Wang Runpeng Geng Ying Chen Jinyuan Jia LLMAG 47 0 1 05 Aug 2025
Provably Secure Retrieval-Augmented Generation Pengcheng Zhou Yinglun Feng Zhongliang Yang SILM 42 0 0 01 Aug 2025
Understanding the Supply Chain and Risks of Large Language Model Applications Yujie Ma Lili Quan Xiaofei Xie Qiang Hu Jiongchi Yu Y. Zhang S. Chen ELM 73 0 0 24 Jul 2025
Defending Against Prompt Injection With a Few DefensiveTokens Sizhe Chen Yizhu Wang Nicholas Carlini Chawin Sitawarin David Wagner LLMAG AAML SILM 93 2 0 10 Jul 2025
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover Matteo Lupinacci Francesco Aurelio Pironti Francesco Blefari Francesco Romeo Luigi Arena Angelo Furfaro LLMAG AAML 105 1 0 09 Jul 2025
Design Patterns for Securing LLM Agents against Prompt Injections Luca Beurer-Kellner Beat Buesser Ana-Maria Creţu Edoardo Debenedetti Daniel Dobos Daniel Fabian ... Daniel Naeff Ezinwanne Ozoani Andrew Paverd F. Tramèr Václav Volhejn LLMAG SILM AAML 107 5 0 10 Jun 2025
JavelinGuard: Low-Cost Transformer Architectures for LLM Security Yash Datta Sharath Rajasekar 80 0 0 09 Jun 2025
Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering Yi Ji Runzhi Li Baolei Mao AAML 63 0 0 05 Jun 2025
TracLLM: A Generic Framework for Attributing Long Context LLMs Yanting Wang Wei Zou Runpeng Geng Jinyuan Jia LLMAG 231 0 0 04 Jun 2025
ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by using OAuth-Enhanced Tool Definitions and Policy-Based Access Control Manish Bhatt Vineeth Sai Narajala Idan Habler 93 1 0 02 Jun 2025
Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution Meysam Alizadeh Zeynab Samei Daria Stetsenko Fabrizio Gilardi SILM 112 3 0 01 Jun 2025
When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs Xinyue Shen Yun Shen Michael Backes Yang Zhang 75 0 0 30 May 2025
LLM Agents Should Employ Security Principles Kaiyuan Zhang Zian Su Pin-Yu Chen E. Bertino Xiangyu Zhang Ninghui Li LLMAG 157 5 0 29 May 2025
Securing AI Agents with Information-Flow Control Manuel Costa Boris Köpf Aashish Kolluri Andrew Paverd M. Russinovich Ahmed Salem Shruti Tople Lukas Wutschitz Santiago Zanella Béguelin 448 4 0 29 May 2025
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM Lei Yu Yechao Zhang Ziqi Zhou Yang Wu Wei Wan Minghui Li Shengshan Hu Pei Xiaobing Jing Wang AAML 72 0 0 28 May 2025
Security Concerns for Large Language Models: A Survey Miles Q. Li Benjamin C. M. Fung PILM ELM 248 6 0 24 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents Liangxuan Wu Chao Wang Tianming Liu Yanjie Zhao Haoyu Wang AAML 175 2 0 19 May 2025
WebInject: Prompt Injection Attack to Web Agents Xilong Wang John Bloch Zedian Shao Yuepeng Hu Shuyan Zhou Neil Zhenqiang Gong AAML LLMAG 191 2 0 16 May 2025
Practical Reasoning Interruption Attacks on Reasoning Large Language Models Yu Cui Cong Zuo SILM AAML LRM 166 2 0 10 May 2025
Defending against Indirect Prompt Injection by Instruction Detection Tongyu Wen Chenglong Wang Xiyuan Yang Haoyu Tang Yueqi Xie Lingjuan Lyu Zhicheng Dou Fangzhao Wu AAML 150 2 0 08 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade AAML SILM 300 11 0 07 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan Xiaogeng Liu Chaowei Xiao AAML 244 0 0 01 May 2025
Prompt Injection Attack to Tool Selection in LLM Agents Jiawen Shi Zenghui Yuan Guiyao Tie Pan Zhou Neil Zhenqiang Gong Lichao Sun LLMAG 237 11 0 28 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey Shuang Tian Tao Zhang Qingbin Liu Jiacheng Wang Xuangou Wu ... Ruichen Zhang Wentao Zhang Zhenhui Yuan Shiwen Mao Dong In Kim 236 2 0 22 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov Arman Zharmagambetov Aaron Grattafiori Chuan Guo Kamalika Chaudhuri AAML 226 17 0 22 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection Le Wang Zonghao Ying Tianyuan Zhang Yaning Tan Shengshan Hu Mingchuan Zhang A. Liu Xianglong Liu AAML 340 10 0 19 Apr 2025
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks Yupei Liu Yuqi Jia Jinyuan Jia Dawn Song Neil Zhenqiang Gong AAML 187 19 0 15 Apr 2025
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms Mutahar Ali Arjun Arunasalam Habiba Farrukh SILM 192 3 0 09 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Yu Yang Andy Zhang Dawn Song 180 9 0 07 Apr 2025
Practical Poisoning Attacks against Retrieval-Augmented Generation Baolei Zhang Yuxiao Chen Minghong Fang Zhuqing Liu Lihai Nie Tong Li Zheli Liu SILM AAML 165 3 0 04 Apr 2025
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage Peter Yong Zhong Siyuan Chen Ruiqi Wang McKenna McCall Ben L. Titzer Heather Miller Phillip B. Gibbons LLMAG 216 15 0 17 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs A. Kumar Jaechul Roh A. Naseh Marzena Karpinska Mohit Iyyer Amir Houmansadr Eugene Bagdasarian LRM 257 36 0 04 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models Ziqing Yang Yixin Wu Rui Wen Michael Backes Yang Zhang 137 2 0 03 Feb 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models Jingwei Yi Yueqi Xie Bin Zhu Emre Kiciman Guangzhong Sun Xing Xie Fangzhao Wu AAML 247 109 0 28 Jan 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts Dhia Elhaq Rzig Dhruba Jyoti Paul Kaiser Pister Jordan Henkel Foyzul Hassan 186 0 0 21 Jan 2025
Non-Halting Queries: Exploiting Fixed Points in LLMs Ghaith Hammouri Kemal Derya B. Sunar 117 0 0 08 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents H. Zhang Jingyuan Huang Kai Mei Yifei Yao Zhenting Wang Chenlu Zhan Hongwei Wang Yongfeng Zhang AAML LLMAG ELM 299 55 0 03 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language Models Jing Cui Yishi Xu Zhewei Huang Shuchang Zhou Jianbin Jiao Junge Zhang PILM AAML 185 5 0 05 Sep 2024