Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

23 February 2023

Sahar Abdelnabi

Mario Fritz

Papers citing "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"

50 / 288 papers shown

Title
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments Hojae Han Seung-won Hwang Rajhans Samdani Yuxiong He ALM 65 2 0 27 Feb 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems Pierre Peigne-Lefebvre Mikolaj Kniejski Filip Sondej Matthieu David J. Hoelscher-Obermaier Christian Schroeder de Witt Esben Kran 51 4 0 26 Feb 2025
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment Pedram Zaree Md Abdullah Al Mamun Quazi Mishkatul Alam Yue Dong Ihsen Alouani Nael B. Abu-Ghazaleh AAML 41 0 0 24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo Giandomenico Cornacchia Kieran Fraser Muhammad Zaid Hameed Ambrish Rawat Beat Buesser Mark Purcell Pin-Yu Chen P. Sattigeri Kush R. Varshney AAML 43 1 0 24 Feb 2025
RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis Jianwei Wang Junyao Yang Haoran Li Huiping Zhuang Cen Chen Ziqian Zeng SyDa 38 0 0 23 Feb 2025
Can Indirect Prompt Injection Attacks Be Detected and Removed? Yulin Chen Haoran Li Yuan Sui Yufei He Yue Liu Y. Song Bryan Hooi AAML 42 3 0 23 Feb 2025
Scaling Trends in Language Model Robustness Nikolhaus Howe Michal Zajac I. R. McKenzie Oskar Hollinsworth Tom Tseng Aaron David Tucker Pierre-Luc Bacon Adam Gleave 106 1 0 21 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning Junkai Chen Zhijie Deng Kening Zheng Yibo Yan Shuliang Liu PeiJun Wu Peijie Jiang J. Liu Xuming Hu MU 55 3 0 18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models Huawei Lin Yingjie Lao Tong Geng Tan Yu Weijie Zhao AAML SILM 79 2 0 18 Feb 2025
Detecting Phishing Sites Using ChatGPT Takashi Koide Naoki Fukushi Hiroki Nakano Daiki Chiba 80 30 0 17 Feb 2025
Generative AI for Internet of Things Security: Challenges and Opportunities Yan Lin Aung Ivan Christian Ye Dong Xiaodong Ye Sudipta Chattopadhyay Jianying Zhou 54 1 0 13 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate Javier Rando Jie Zhang Nicholas Carlini F. Tramèr AAML ELM 54 3 0 04 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs A. Kumar Jaechul Roh A. Naseh Marzena Karpinska Mohit Iyyer Amir Houmansadr Eugene Bagdasarian LRM 57 12 0 04 Feb 2025
Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale Elisa Tsai Neal Mangaokar Boyuan Zheng Haizhong Zheng A. Prakash 53 0 0 03 Feb 2025
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models Y. Gong Zhuo Chen Miaokun Chen Fengchang Yu Wei-Tsung Lu XiaoFeng Wang Xiaozhong Liu J. Liu AAML SILM 58 0 0 03 Feb 2025
Trading Inference-Time Compute for Adversarial Robustness Wojciech Zaremba Evgenia Nitishinskaya Boaz Barak Stephanie Lin Sam Toyer ... Rachel Dias Eric Wallace Kai Y. Xiao Johannes Heidecke Amelia Glaese LRM AAML 85 15 0 31 Jan 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models Jingwei Yi Yueqi Xie Bin Zhu Emre Kiciman Guangzhong Sun Xing Xie Fangzhao Wu AAML 51 64 0 28 Jan 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts Dhia Elhaq Rzig Dhruba Jyoti Paul Kaiser Pister Jordan Henkel Foyzul Hassan 75 0 0 21 Jan 2025
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines Xiyang Hu AAML 31 1 0 03 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models Miao Yu Junfeng Fang Yingjie Zhou Xing Fan Kun Wang Shirui Pan Qingsong Wen AAML 58 0 0 03 Jan 2025
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search Matan Ben-Tov Mahmood Sharif RALM 35 0 0 31 Dec 2024
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning Alex Beutel Kai Y. Xiao Johannes Heidecke Lilian Weng AAML 43 3 0 24 Dec 2024
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents Feiran Jia Tong Wu Xin Qin Anna Squicciarini LLMAG AAML 80 4 0 21 Dec 2024
Position: A taxonomy for reporting and describing AI security incidents L. Bieringer Kevin Paeth Andreas Wespi Kathrin Grosse Alexandre Alahi Kathrin Grosse 78 0 0 19 Dec 2024
Towards Action Hijacking of Large Language Model-based Agent Yuyang Zhang Kangjie Chen Xudong Jiang Yuxiang Sun Run Wang Lina Wang LLMAG AAML 73 2 0 14 Dec 2024
On the Ethical Considerations of Generative Agents Nýoma Diamond Soumya Banerjee 67 2 0 28 Nov 2024
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach Ruoxi Sun Jiamin Chang Hammond Pearce Chaowei Xiao B. Li Qi Wu Surya Nepal Minhui Xue 30 0 0 17 Nov 2024
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook Meng Yang Tianqing Zhu Chi Liu Wanlei Zhou Shui Yu Philip S. Yu AAML ELM PILM 56 1 0 12 Nov 2024
A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations Yulong Yang Haoran Fan Chenhao Lin Qian Li Zhengyu Zhao Chao Shen Xiaohong Guan AAML 43 0 0 12 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni Jonathan Colaço-Carr Yash More Jackie CK Cheung G. Farnadi 73 0 0 12 Nov 2024
Diversity Helps Jailbreak Large Language Models Weiliang Zhao Daniel Ben-Levi Wei Hao Junfeng Yang Chengzhi Mao AAML 90 0 0 06 Nov 2024
Defense Against Prompt Injection Attack by Leveraging Attack Techniques Yulin Chen Haoran Li Zihao Zheng Y. Song Dekai Wu Bryan Hooi SILM AAML 47 4 0 01 Nov 2024
Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung Ching-Yun Ko Ambrish Rawat I-Hsin Chung Winston H. Hsu Pin-Yu Chen 46 7 0 01 Nov 2024
Retrieval-Augmented Generation with Estimation of Source Reliability Jeongyeon Hwang Junyoung Park Hyejin Park Sangdon Park Jungseul Ok RALM 42 0 0 30 Oct 2024
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models Yucheng Zhang Qinfeng Li Tianyu Du Xuhong Zhang Xinkui Zhao Zhengwen Feng Jianwei Yin AAML SILM 36 5 0 30 Oct 2024
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models H. Li Xiaogeng Liu SILM 37 4 0 30 Oct 2024
Enhancing Adversarial Attacks through Chain of Thought Jingbo Su LRM 18 2 0 29 Oct 2024
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks Jiongxiao Wang Fangzhou Wu Wendi Li Jinsheng Pan Edward Suh Zhuoqing Mao Muhao Chen Chaowei Xiao AAML 34 6 0 28 Oct 2024
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks Dario Pasquini Evgenios M. Kornaropoulos G. Ateniese AAML 22 3 0 28 Oct 2024
Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection M. Rahman Fan Wu A. Cuzzocrea S. Ahamed AAML 23 3 0 28 Oct 2024
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In Itay Nakash George Kour Guy Uziel Ateret Anaby-Tavor AAML LLMAG 32 4 0 22 Oct 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis Jonathan Brokman Omer Hofman Oren Rachmil Inderjeet Singh Vikas Pahuja Rathina Sabapathy Aishvariya Priya Amit Giloni Roman Vainshtein Hisashi Kojima 31 1 0 21 Oct 2024
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks Daniel Ayzenshteyn Roy Weiss Yisroel Mirsky AAML 26 0 0 20 Oct 2024
Imprompter: Tricking LLM Agents into Improper Tool Use Xiaohan Fu Shuheng Li Zihan Wang Y. Liu Rajesh K. Gupta Taylor Berg-Kirkpatrick Earlence Fernandes SILM LLMAG 54 15 0 19 Oct 2024
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models Cody Clop Yannick Teglia AAML SILM RALM 40 2 0 18 Oct 2024
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs Volker Strobel Marco Dorigo Mario Fritz LRM 24 3 0 15 Oct 2024
Cognitive Overload Attack:Prompt Injection for Long Context Bibek Upadhayay Vahid Behzadan Amin Karbasi AAML 28 2 0 15 Oct 2024
Are You Human? An Adversarial Benchmark to Expose LLMs Gilad Gressel Rahul Pankajakshan Yisroel Mirsky DeLMO 38 0 0 12 Oct 2024
DAWN: Designing Distributed Agents in a Worldwide Network Zahra Aminiranjbar Jianan Tang Qiudan Wang Shubha Pant Mahesh Viswanathan LLMAG AI4CE 23 2 0 11 Oct 2024
Mind Your Questions! Towards Backdoor Attacks on Text-to-Visualization Models Shuaimin Li Yuanfeng Song Xuanang Chen Anni Peng Zhuoyue Wan Chen Jason Zhang Raymond Chi-Wing Wong SILM 29 0 0 09 Oct 2024