344
v1v2v3 (latest)

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

Main:6 Pages
2 Figures
Bibliography:2 Pages
3 Tables
Abstract

This paper introduces SecRepoBench, a benchmark to evaluate code agents on secure code completion in real-world repositories. SecRepoBench has 318 code completion tasks in 27 C/C++ repositories, covering 15 CWEs. We evaluate 29 standalone LLMs and 15 code agents across 3 state-of-the-art agent frameworks using our benchmark. We find that state-of-the-art LLMs struggle with generating correct and secure code completions. However, code agents significantly outperform standalone LLMs. We show that SecRepoBench is more difficult than the prior state-of-the-art benchmark. Finally, our comprehensive analysis provides insights into potential directions for enhancing the ability of code agents to write correct and secure code in real-world repositories.

View on arXiv
Comments on this paper