AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

15 May 2025

Abstract

A recent area of increasing research is the use of Large Language Models (LLMs) in penetration testing, which promises to reduce costs and thus allow for higher frequency. We conduct a review of related work, identifying best practices and common evaluation issues. We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy. AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain. It can perform complex multi-step tasks, augmented by external tools and knowledge bases. We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface. Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT. We measure a total cost of \ $96.20 US when using AutoPentest across all experiments, while a one-month subscription to ChatGPT Plus costs \$ 20. The results show that further implementation efforts and the use of more powerful LLMs released in the future are likely to make this a viable part of vulnerability management.

View on arXiv

@article{henke2025_2505.10321,
  title={ AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents },
  author={ Julius Henke },
  journal={arXiv preprint arXiv:2505.10321},
  year={ 2025 }
}

Comments on this paper