GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

13 June 2024

Qinbin Li

Jiawei Zhang

Dawn Song

Bo Li

LLMAG

ArXiv PDF HTML

Abstract

The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security, which cannot be addressed by traditional textual-harm-focused LLM guardrails. We propose GuardAgent, the first guardrail agent to protect the target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. GuardAgent can understand different safety guard requests and provide reliable code-based guardrails with high flexibility and low operational overhead. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. We show that GuardAgent effectively moderates the violation actions for different types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively. Project page:this https URL

View on arXiv

@article{xiang2025_2406.09187,
  title={ GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning },
  author={ Zhen Xiang and Linzhi Zheng and Yanjie Li and Junyuan Hong and Qinbin Li and Han Xie and Jiawei Zhang and Zidi Xiong and Chulin Xie and Carl Yang and Dawn Song and Bo Li },
  journal={arXiv preprint arXiv:2406.09187},
  year={ 2025 }
}

Comments on this paper