GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

28 August 2025

Haibo Jin

ArXiv (abs)PDF HTML Github

Main:13 Pages

15 Figures

Bibliography:5 Pages

15 Tables

Appendix:35 Pages

Abstract

As Large Language Models become increasingly integral to various domains, their potential to generate harmful responses has prompted significant societal and regulatory concerns. In response, governments have issued ethics guidelines to promote the development of trustworthy AI. However, these guidelines are typically high-level demands for developers and testers, leaving a gap in translating them into actionable testing questions to verify LLM compliance.

View on arXiv

Comments on this paper