Automatic and Universal Prompt Injection Attacks against Large Language Models

7 March 2024

Xiaogeng Liu

Papers citing "Automatic and Universal Prompt Injection Attacks against Large Language Models"

25 / 25 papers shown

Title
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade AAML SILM 40 0 0 07 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures Francisco Aguilera-Martínez Fernando Berzal PILM 45 0 0 02 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan Xiaogeng Liu Chaowei Xiao AAML 67 0 0 01 May 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Y. Chen Haoran Li Yuan Sui Y. Liu Yufei He Y. Song Bryan Hooi AAML SILM 61 0 0 29 Apr 2025
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight Ben Goertzel Paulos Yibelo SILM 54 0 0 26 Apr 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections Narek Maloyan Dmitry Namiot SILM AAML ELM 70 0 0 25 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection Le Wang Zonghao Ying Tianyuan Zhang Siyuan Liang Shengshan Hu Mingchuan Zhang A. Liu Xianglong Liu AAML 31 1 0 19 Apr 2025
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks Yupei Liu Yuqi Jia Jinyuan Jia Dawn Song Neil Zhenqiang Gong AAML 31 0 0 15 Apr 2025
StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models Yang Feng Xudong Pan AAML 26 0 0 14 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators Xitao Li H. Wang Jiang Wu Ting Liu AAML 23 0 0 08 Apr 2025
Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models Shih-Wen Ke Guan-Yu Lai Guo-Lin Fang Hsi-Yuan Kao SILM 79 0 0 26 Mar 2025
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks Changyue Jiang Xudong Pan Geng Hong Chenfu Bao Min Yang SILM 69 7 0 21 Nov 2024
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation Ke Zhao Huayang Huang Miao Li Yu Wu AAML 66 0 0 21 Nov 2024
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems Donghyun Lee Mo Tiwari LLMAG 18 8 0 09 Oct 2024
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks Zi Wang Divyam Anshumaan Ashish Hooda Yudong Chen Somesh Jha AAML 35 0 0 05 Oct 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs Jiahao Yu Yangguang Shao Hanwen Miao Junzheng Shi SILM AAML 53 3 0 23 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language Models Jing Cui Yishi Xu Zhewei Huang Shuchang Zhou Jianbin Jiao Junge Zhang PILM AAML 45 1 0 05 Sep 2024
SafeEmbodAI: a Safety Framework for Mobile Robots in Embodied AI Systems Wenxiao Zhang Xiangrui Kong Thomas Braunl Jin B. Hong 18 2 0 03 Sep 2024
On Large Language Models in National Security Applications William N. Caballero Phillip R. Jenkins ELM 21 6 0 03 Jul 2024
Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs Yihao Huang Chong Wang Xiaojun Jia Qing-Wu Guo Felix Juefei Xu Jian Zhang G. Pu Yang Liu 17 8 0 23 May 2024
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation Yuxi Li Yi Liu Yuekang Li Ling Shi Gelei Deng Shengquan Chen Kailong Wang 24 12 0 20 May 2024
Sociotechnical Implications of Generative Artificial Intelligence for Information Access Bhaskar Mitra Henriette Cramer Olya Gurevich 18 2 0 19 May 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018
ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector Shang-Tse Chen Cory Cornelius Jason Martin Duen Horng Chau ObjD 136 422 0 16 Apr 2018