186

SafePro: Evaluating the Safety of Professional-Level AI Agents

Kaiwen Zhou
Shreedhar Jangam
Ashwin Nagarajan
Tejas Polu
Suhas Oruganti
Chengzhi Liu
Ching-Chen Kuo
Yuting Zheng
Sravana Narayanaraju
Xin Eric Wang
Main:8 Pages
6 Figures
Bibliography:2 Pages
13 Tables
Appendix:6 Pages
Abstract

Large language model-based agents are rapidly evolving from simple conversational assistants into autonomous systems capable of performing complex, professional-level tasks in various domains. While these advancements promise significant productivity gains, they also introduce critical safety risks that remain under-explored. Existing safety evaluations primarily focus on simple, daily assistance tasks, failing to capture the intricate decision-making processes and potential consequences of misaligned behaviors in professional settings. To address this gap, we introduce \textbf{SafePro}, a comprehensive benchmark designed to evaluate the safety alignment of AI agents performing professional activities. SafePro features a dataset of high-complexity tasks across diverse professional domains with safety risks, developed through a rigorous iterative creation and review process. Our evaluation of state-of-the-art AI models reveals significant safety vulnerabilities and uncovers new unsafe behaviors in professional contexts. We further show that these models exhibit both insufficient safety judgment and weak safety alignment when executing complex professional tasks. In addition, we investigate safety mitigation strategies for improving agent safety in these scenarios and observe encouraging improvements. Together, our findings highlight the urgent need for robust safety mechanisms tailored to the next generation of professional AI agents.

View on arXiv
Comments on this paper