An Early Categorization of Prompt Injection Attacks on Large Language
Models

An Early Categorization of Prompt Injection Attacks on Large Language Models

31 January 2024

Alisia Marianne Michel

Papers citing "An Early Categorization of Prompt Injection Attacks on Large Language Models"

13 / 13 papers shown

Title
Defending against Indirect Prompt Injection by Instruction Detection Tongyu Wen Chenglong Wang Xiyuan Yang Haoyu Tang Yueqi Xie Lingjuan Lyu Zhicheng Dou Fangzhao Wu AAML 24 0 0 08 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade AAML SILM 46 0 0 07 May 2025
Attack and defense techniques in large language models: A survey and new perspectives Zhiyu Liao Kang Chen Yuanguo Lin Kangkang Li Yunxuan Liu Hefeng Chen Xingwang Huang Yuanhui Yu AAML 54 0 0 02 May 2025
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models Tri Nguyen Lohith Srikanth Pentapalli Magnus Sieverding Laurah Turner Seth Overla ... Michael Gharib Matt Kelleher Michael Shukis Cameron Pawlik Kelly Cohen 48 0 0 21 Apr 2025
You've Changed: Detecting Modification of Black-Box Large Language Models Alden Dima James R. Foulds Shimei Pan Philip G. Feldman 30 0 0 14 Apr 2025
Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection Ryan Marinelli Josef Pichlmeier Tamás Bisztray LRM 36 0 0 27 Mar 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts Dhia Elhaq Rzig Dhruba Jyoti Paul Kaiser Pister Jordan Henkel Foyzul Hassan 75 0 0 21 Jan 2025
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization Nay Myat Min Long H. Pham Yige Li Jun Sun AAML 64 3 0 18 Nov 2024
Palisade -- Prompt Injection Detection Framework Sahasra Kokkula Somanathan R Nandavardhan R Aashishkumar G Divya AAML 25 1 0 28 Oct 2024
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data Xuefeng Du Reshmi Ghosh Robert Sim Ahmed Salem Vitor Carvalho Emily Lawton Yixuan Li Jack W. Stokes VLM AAML 32 5 0 01 Oct 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 203 2,232 0 22 Mar 2023
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 267 1,798 0 14 Dec 2020