35
1

Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing

Abstract

With the emergence of high-performance large language models (LLMs) such as GPT, Claude, and Gemini, the autonomous and semi-autonomous execution of tasks has significantly advanced across various domains. However, in highly specialized fields such as cybersecurity, full autonomy remains a challenge. This difficulty primarily stems from the limitations of LLMs in reasoning capabilities and domain-specific knowledge. We propose a system that semi-autonomously executes complex cybersecurity workflows by employing multiple LLMs modules to formulate attack strategies, generate commands, and analyze results, thereby addressing the aforementioned challenges. In our experiments using Hack The Box virtual machines, we confirmed that our system can autonomously construct attack strategies, issue appropriate commands, and automate certain processes, thereby reducing the need for manual intervention.

View on arXiv
@article{kobayashi2025_2502.15506,
  title={ Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing },
  author={ Masaya Kobayashi and Masane Fuchi and Amar Zanashir and Tomonori Yoneda and Tomohiro Takagi },
  journal={arXiv preprint arXiv:2502.15506},
  year={ 2025 }
}
Comments on this paper