Multi-Agent Systems Execute Arbitrary Malicious Code

15 March 2025

Abstract

Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web content, files, email attachments, etc.Using several recently proposed multi-agent frameworks as concrete examples, we demonstrate that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. This results in a complete security breach, up to execution of arbitrary malicious code on the user's device and/or exfiltration of sensitive data from the user's containerized environment. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection, and even if they refuse to perform harmful actions.

View on arXiv

@article{triedman2025_2503.12188,
  title={ Multi-Agent Systems Execute Arbitrary Malicious Code },
  author={ Harold Triedman and Rishi Jha and Vitaly Shmatikov },
  journal={arXiv preprint arXiv:2503.12188},
  year={ 2025 }
}

Comments on this paper