PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance

28 August 2025

Mengxiao Wang

ArXiv (abs)PDF HTML Github

Main:12 Pages

8 Figures

Bibliography:3 Pages

11 Tables

Appendix:2 Pages

Abstract

Large Language Models (LLMs) are increasingly integrated into real-world applications, from virtual assistants to autonomous agents. However, their flexibility also introduces new attack vectors-particularly Prompt Injection (PI), where adversaries manipulate model behavior through crafted inputs. As attackers continuously evolve with paraphrased, obfuscated, and even multi-task injection strategies, existing benchmarks are no longer sufficient to capture the full spectrum of emerging threats.

View on arXiv

Comments on this paper