SafePro: Evaluating the Safety of Professional-Level AI Agents

10 January 2026

Kaiwen Zhou

Shreedhar Jangam

Ashwin Nagarajan

Tejas Polu

Suhas Oruganti

Chengzhi Liu

Ching-Chen Kuo

Yuting Zheng

Sravana Narayanaraju

Xin Eric Wang

LLMAG

ELM

ArXiv (abs)PDF HTML Github

Main:8 Pages

6 Figures

Bibliography:2 Pages

13 Tables

Appendix:6 Pages

Abstract

Large language model-based agents are rapidly evolving from simple conversational assistants into autonomous systems capable of performing complex, professional-level tasks in various domains. While these advancements promise significant productivity gains, they also introduce critical safety risks that remain under-explored. Existing safety evaluations primarily focus on simple, daily assistance tasks, failing to capture the intricate decision-making processes and potential consequences of misaligned behaviors in professional settings. To address this gap, we introduce \textbf{SafePro}, a comprehensive benchmark designed to evaluate the safety alignment of AI agents performing professional activities. SafePro features a dataset of high-complexity tasks across diverse professional domains with safety risks, developed through a rigorous iterative creation and review process. Our evaluation of state-of-the-art AI models reveals significant safety vulnerabilities and uncovers new unsafe behaviors in professional contexts. We further show that these models exhibit both insufficient safety judgment and weak safety alignment when executing complex professional tasks. In addition, we investigate safety mitigation strategies for improving agent safety in these scenarios and observe encouraging improvements. Together, our findings highlight the urgent need for robust safety mechanisms tailored to the next generation of professional AI agents.

View on arXiv

Comments on this paper