Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

28 January 2025

Papers citing "Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models"

50 / 57 papers shown

Title
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade AAML SILM 38 0 0 07 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan Xiaogeng Liu Chaowei Xiao AAML 67 0 0 01 May 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Y. Chen Haoran Li Yuan Sui Y. Liu Yufei He Y. Song Bryan Hooi AAML SILM 61 0 0 29 Apr 2025
AI Awareness X. Li Haoyuan Shi Rongwu Xu Wei Xu 54 0 0 25 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov Arman Zharmagambetov Aaron Grattafiori Chuan Guo Kamalika Chaudhuri AAML 30 0 0 22 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection Le Wang Zonghao Ying Tianyuan Zhang Siyuan Liang Shengshan Hu Mingchuan Zhang A. Liu Xianglong Liu AAML 31 1 0 19 Apr 2025
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks Yupei Liu Yuqi Jia Jinyuan Jia Dawn Song Neil Zhenqiang Gong AAML 29 0 0 15 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators Xitao Li H. Wang Jiang Wu Ting Liu AAML 21 0 0 08 Apr 2025
Detecting LLM-Written Peer Reviews Vishisht Rao Aounon Kumar Himabindu Lakkaraju Nihar B. Shah DeLMO AAML 71 0 0 20 Mar 2025
Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents Juhee Kim Woohyuk Choi Byoungyoung Lee LLMAG 68 1 0 17 Mar 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models Egor Zverev Evgenii Kortukov Alexander Panfilov Soroush Tabesh Alexandra Volkova Sebastian Lapuschkin Wojciech Samek Christoph H. Lampert AAML 52 1 0 13 Mar 2025
EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models Nastaran Darabi Devashri Naik Sina Tayebati Dinithi Jayasuriya Ranganath Krishnan A. R. Trivedi AAML 37 0 0 24 Feb 2025
Can Indirect Prompt Injection Attacks Be Detected and Removed? Yulin Chen Haoran Li Yuan Sui Yufei He Yue Liu Y. Song Bryan Hooi AAML 32 3 0 23 Feb 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts Dhia Elhaq Rzig Dhruba Jyoti Paul Kaiser Pister Jordan Henkel Foyzul Hassan 62 0 0 21 Jan 2025
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines Xiyang Hu AAML 20 1 0 03 Jan 2025
Towards Action Hijacking of Large Language Model-based Agent Yuyang Zhang Kangjie Chen Xudong Jiang Yuxiang Sun Run Wang Lina Wang LLMAG AAML 68 2 0 14 Dec 2024
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks Changyue Jiang Xudong Pan Geng Hong Chenfu Bao Min Yang SILM 69 3 0 21 Nov 2024
Defense Against Prompt Injection Attack by Leveraging Attack Techniques Yulin Chen Haoran Li Zihao Zheng Y. Song Dekai Wu Bryan Hooi SILM AAML 37 4 0 01 Nov 2024
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks Jiongxiao Wang Fangzhou Wu Wendi Li Jinsheng Pan Edward Suh Zhuoqing Mao Muhao Chen Chaowei Xiao AAML 21 6 0 28 Oct 2024
Palisade -- Prompt Injection Detection Framework Sahasra Kokkula Somanathan R Nandavardhan R Aashishkumar G Divya AAML 23 0 0 28 Oct 2024
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In Itay Nakash George Kour Guy Uziel Ateret Anaby-Tavor AAML LLMAG 18 2 0 22 Oct 2024
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Jingyu Zhang Ahmed Elgohary Ahmed Magooda Daniel Khashabi Benjamin Van Durme 33 2 0 11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations Tarun Raheja Nilay Pochhi AAML 38 1 0 09 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents Hanrong Zhang Jingyuan Huang Kai Mei Yifei Yao Zhenting Wang Chenlu Zhan Hongwei Wang Yongfeng Zhang AAML LLMAG ELM 34 17 0 03 Oct 2024
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data Xuefeng Du Reshmi Ghosh Robert Sim Ahmed Salem Vitor Carvalho Emily Lawton Yixuan Li Jack W. Stokes VLM AAML 26 3 0 01 Oct 2024
System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective Fangzhou Wu Ethan Cecchetti Chaowei Xiao 21 1 0 27 Sep 2024
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection M. Rahman Hossain Shahriar Fan Wu A. Cuzzocrea AAML 29 4 0 20 Sep 2024
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration Jiahui Gao Renjie Pi Tianyang Han Han Wu Lanqing Hong Lingpeng Kong Xin Jiang Zhenguo Li 26 5 0 17 Sep 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents Edoardo Debenedetti Jie Zhang Mislav Balunović Luca Beurer-Kellner Marc Fischer Florian Tramèr LLMAG AAML 32 25 1 19 Jun 2024
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications Stephen Burabari Tete 16 5 0 16 Jun 2024
Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications Junlin Wang Tianyi Yang Roy Xie Bhuwan Dhingra SILM AAML 23 3 0 10 Jun 2024
Ranking Manipulation for Conversational Search Engines Samuel Pfrommer Yatong Bai Tanmay Gautam Somayeh Sojoudi SILM 28 4 0 05 Jun 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Richard Fang Antony Kellermann Akul Gupta Qiusi Zhan Richard Fang R. Bindu Daniel Kang LLMAG 20 27 0 02 Jun 2024
AI Risk Management Should Incorporate Both Safety and Security Xiangyu Qi Yangsibo Huang Yi Zeng Edoardo Debenedetti Jonas Geiping ... Chaowei Xiao Bo-wen Li Dawn Song Peter Henderson Prateek Mittal AAML 37 9 0 29 May 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Eric Wallace Kai Y. Xiao R. Leike Lilian Weng Johannes Heidecke Alex Beutel SILM 47 113 0 19 Apr 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities Richard Fang R. Bindu Akul Gupta Daniel Kang SILM LLMAG 66 52 0 11 Apr 2024
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications Shishir G. Patil Tianjun Zhang Vivian Fang Noppapon C Roy Huang Uc Berkeley Aaron Hao Martin Casado Joseph E. Gonzalez Raluca Ada Popa Ion Stoica ALM 18 9 0 10 Apr 2024
Defending Against Indirect Prompt Injection Attacks With Spotlighting Keegan Hines Gary Lopez Matthew Hall Federico Zarfati Yonatan Zunger Emre Kiciman AAML SILM 16 22 0 20 Mar 2024
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? Egor Zverev Sahar Abdelnabi Soroush Tabesh Mario Fritz Christoph H. Lampert 35 19 0 11 Mar 2024
Automatic and Universal Prompt Injection Attacks against Large Language Models Xiaogeng Liu Zhiyuan Yu Yizhe Zhang Ning Zhang Chaowei Xiao SILM AAML 25 33 0 07 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents Qiusi Zhan Zhixiang Liang Zifan Ying Daniel Kang LLMAG 42 35 0 05 Mar 2024
A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems Fangzhou Wu Ning Zhang Somesh Jha P. McDaniel Chaowei Xiao 19 42 0 28 Feb 2024
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Zhenting Qi Hanlin Zhang Eric Xing Sham Kakade Hima Lakkaraju SILM 29 16 0 27 Feb 2024
WIPI: A New Web Threat for LLM-Driven Web Agents Fangzhou Wu Shutong Wu Yulong Cao Chaowei Xiao LLMAG 26 17 0 26 Feb 2024
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis Yueqi Xie Minghong Fang Renjie Pi Neil Zhenqiang Gong 32 21 0 21 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries Sizhe Chen Julien Piet Chawin Sitawarin David A. Wagner SILM AAML 12 65 0 09 Feb 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents Tongxin Yuan Zhiwei He Lingzhong Dong Yiming Wang Ruijie Zhao ... Binglin Zhou Fangqi Li Zhuosheng Zhang Rui Wang Gongshen Liu ELM 13 41 0 18 Jan 2024
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance Renjie Pi Tianyang Han Jianshu Zhang Yueqi Xie Rui Pan Qing Lian Hanze Dong Jipeng Zhang Tong Zhang AAML 17 53 0 05 Jan 2024
Formalizing and Benchmarking Prompt Injection Attacks and Defenses Yupei Liu Yuqi Jia Runpeng Geng Jinyuan Jia Neil Zhenqiang Gong SILM LLMAG 13 38 0 19 Oct 2023
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins Umar Iqbal Tadayoshi Kohno Franziska Roesner ELM SILM 51 41 0 19 Sep 2023