Attention Tracker: Detecting Prompt Injection Attacks in LLMs

1 November 2024

Papers citing "Attention Tracker: Detecting Prompt Injection Attacks in LLMs"

5 / 5 papers shown

Title
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade AAML SILM 40 0 0 07 May 2025
Attack and defense techniques in large language models: A survey and new perspectives Zhiyu Liao Kang Chen Yuanguo Lin Kangkang Li Yunxuan Liu Hefeng Chen Xingwang Huang Yuanhui Yu AAML 52 0 0 02 May 2025
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage Peter Yong Zhong Siyuan Chen Ruiqi Wang McKenna McCall Ben L. Titzer Heather Miller Phillip B. Gibbons LLMAG 75 3 0 17 Feb 2025
Lightweight Safety Classification Using Pruned Language Models Mason Sawtell Tula Masterman Sandi Besen Jim Brown 76 2 0 18 Dec 2024
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack Kaiyi Pang Tao Qi Chuhan Wu Minhao Bai Minghu Jiang Yongfeng Huang AAML WaLM 50 2 0 03 May 2024