Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

23 February 2023

Sahar Abdelnabi

Mario Fritz

Papers citing "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"

50 / 289 papers shown

Title
Immunization against harmful fine-tuning attacks Domenic Rosati Jan Wehner Kai Williams Lukasz Bartoszcze Jan Batzner Hassan Sajjad Frank Rudzicz AAML 54 16 0 26 Feb 2024
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings Hao Wang Hao Li Minlie Huang Lei Sha AAML 32 12 0 25 Feb 2024
A Conversational Brain-Artificial Intelligence Interface Anja Meunier Michal Robert Zák Lucas Munz Sofiya Garkot Manuel Eder Jiachen Xu Moritz Grosse-Wentrup 23 0 0 22 Feb 2024
Coercing LLMs to do and reveal (almost) anything Jonas Geiping Alex Stein Manli Shu Khalid Saifullah Yuxin Wen Tom Goldstein AAML 32 43 0 21 Feb 2024
Generative AI Security: Challenges and Countermeasures Banghua Zhu Norman Mu Jiantao Jiao David A. Wagner AAML SILM 56 7 0 20 Feb 2024
Query-Based Adversarial Prompt Generation Jonathan Hayase Ema Borevkovic Nicholas Carlini Florian Tramèr Milad Nasr AAML SILM 43 25 0 19 Feb 2024
SPML: A DSL for Defending Language Models Against Prompt Attacks Reshabh K Sharma Vinayak Gupta Dan Grossman AAML 49 14 0 19 Feb 2024
Proving membership in LLM pretraining data via data watermarks Johnny Tian-Zheng Wei Ryan Yixiang Wang Robin Jia WaLM 16 22 0 16 Feb 2024
A StrongREJECT for Empty Jailbreaks Alexandra Souly Qingyuan Lu Dillon Bowen Tu Trinh Elvis Hsieh ... Pieter Abbeel Justin Svegliato Scott Emmons Olivia Watkins Sam Toyer 25 64 0 15 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents Lingbo Mo Zeyi Liao Boyuan Zheng Yu-Chuan Su Chaowei Xiao Huan Sun AAML LLMAG 41 14 0 15 Feb 2024
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence Timothy R. McIntosh Teo Susnjak Tong Liu Paul Watters Malka N. Halgamuge ALM ELM 62 50 0 15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models Chawin Sitawarin Norman Mu David A. Wagner Alexandre Araujo ELM 19 29 0 15 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey Zhichen Dong Zhanhui Zhou Chao Yang Jing Shao Yu Qiao ELM 52 55 0 14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability Xing-ming Guo Fangxu Yu Huan Zhang Lianhui Qin Bin Hu AAML 114 69 0 13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast Xiangming Gu Xiaosen Zheng Tianyu Pang Chao Du Qian Liu Ye Wang Jing Jiang Min-Bin Lin LLMAG LM&Ro 35 47 0 13 Feb 2024
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models Wei Zou Runpeng Geng Binghui Wang Jinyuan Jia SILM 28 45 1 12 Feb 2024
Whispers in the Machine: Confidentiality in LLM-integrated Systems Jonathan Evertz Merlin Chlosta Lea Schonherr Thorsten Eisenhofer 69 16 0 10 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries Sizhe Chen Julien Piet Chawin Sitawarin David A. Wagner SILM AAML 22 65 0 09 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs Junjie Chu Yugeng Liu Ziqing Yang Xinyue Shen Michael Backes Yang Zhang AAML 33 65 0 08 Feb 2024
(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice Inyoung Cheong King Xia K. J. Kevin Feng Quan Ze Chen Amy X. Zhang AILaw ELM 33 57 0 02 Feb 2024
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions Pouya Pezeshkpour Eser Kandogan Nikita Bhutani Sajjadur Rahman Tom Mitchell Estevam R. Hruschka LLMAG LRM 28 7 0 02 Feb 2024
An Early Categorization of Prompt Injection Attacks on Large Language Models Sippo Rossi Alisia Marianne Michel R. Mukkamala J. Thatcher SILM AAML 19 16 0 31 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater? Michael Feffer Anusha Sinha Wesley Hanwen Deng Zachary Chase Lipton Hoda Heidari AAML 30 66 0 29 Jan 2024
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications Xuchen Suo AAML SILM 15 26 0 15 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems Tianyu Cui Yanling Wang Chuanpu Fu Yong Xiao Sijia Li ... Junwu Xiong Xinyu Kong Zujie Wen Ke Xu Qi Li 55 56 0 11 Jan 2024
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services Zilong Lin Jian Cui Xiaojing Liao XiaoFeng Wang 27 19 0 06 Jan 2024
A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models Daniel Wankit Yip Aysan Esmradi C. Chan AAML 23 11 0 02 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning Julien Piet Maha Alrashed Chawin Sitawarin Sizhe Chen Zeming Wei Elizabeth Sun Basel Alomair David A. Wagner AAML SyDa 75 51 0 29 Dec 2023
Towards Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal cross- and self-attention Large Language Model Approach Jing-Bing Yang Ying Wang 25 5 0 28 Dec 2023
Exploiting Novel GPT-4 APIs Kellin Pelrine Mohammad Taufeeque Michal Zajkac Euan McLean Adam Gleave SILM 18 20 0 21 Dec 2023
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows Madeleine Grunde-McLaughlin Michelle S. Lam Ranjay Krishna Daniel S. Weld Jeffrey Heer AI4CE 45 21 0 18 Dec 2023
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models Aysan Esmradi Daniel Wankit Yip C. Chan AAML 25 11 0 18 Dec 2023
Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak Yanrui Du Sendong Zhao Ming Ma Yuhan Chen Bing Qin 26 15 0 07 Dec 2023
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs Matteo Gioele Collu Tom Janssen-Groesbeek Stefanos Koffas Mauro Conti S. Picek 14 1 0 06 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Anay Mehrotra Manolis Zampetakis Paul Kassianik Blaine Nelson Hyrum Anderson Yaron Singer Amin Karbasi 30 201 0 04 Dec 2023
Intrusion Detection System with Machine Learning and Multiple Datasets Haiyan Xuan Mohith Manohar AAML 6 0 0 04 Dec 2023
Assessing Prompt Injection Risks in 200+ Custom GPTs Jiahao Yu Yuhang Wu Dong Shu Mingyu Jin Sabrina Yang Xinyu Xing 17 51 0 20 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities Lingbo Mo Boshi Wang Muhao Chen Huan Sun 29 27 0 15 Nov 2023
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective Zi Yin Wei Ding Jia Liu 25 1 0 14 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming Suyu Ge Chunting Zhou Rui Hou Madian Khabsa Yi-Chia Wang Qifan Wang Jiawei Han Yuning Mao AAML LRM 11 93 0 13 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts Yichen Gong Delong Ran Jinyuan Liu Conglei Wang Tianshuo Cong Anyu Wang Sisi Duan Xiaoyun Wang MLLM 129 117 0 09 Nov 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation Rusheb Shah Quentin Feuillade--Montixi Soroush Pour Arush Tagade Stephen Casper Javier Rando 21 122 0 06 Nov 2023
Can LLMs Follow Simple Rules? Norman Mu Sarah Chen Zifan Wang Sizhe Chen David Karamardian Lulwa Aljeraisy Basel Alomair Dan Hendrycks David A. Wagner ALM 18 26 0 06 Nov 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer Olivia Watkins Ethan Mendes Justin Svegliato Luke Bailey ... Karim Elmaaroufi Pieter Abbeel Trevor Darrell Alan Ritter Stuart J. Russell 11 71 0 02 Nov 2023
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield Jinhwa Kim Ali Derakhshan Ian G. Harris AAML 86 16 0 31 Oct 2023
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude S. Roy Poojitha Thota Krishna Vamsi Naragam Shirin Nilizadeh SILM 41 15 0 29 Oct 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition Sander Schulhoff Jeremy Pinto Anaum Khan Louis-Franccois Bouchard Chenglei Si Svetlina Anati Valen Tagliabue Anson Liu Kost Christopher Carnahan Jordan L. Boyd-Graber SILM 29 41 0 24 Oct 2023
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers Mosh Levy Shauli Ravfogel Yoav Goldberg 33 5 0 24 Oct 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models Sicheng Zhu Ruiyi Zhang Bang An Gang Wu Joe Barrow Zichao Wang Furong Huang A. Nenkova Tong Sun SILM AAML 30 40 0 23 Oct 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses Yupei Liu Yuqi Jia Runpeng Geng Jinyuan Jia Neil Zhenqiang Gong SILM LLMAG 16 58 0 19 Oct 2023