ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.03586
46
0

Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories

5 March 2025
Alperen Yildiz
Sin G. Teo
Yiling Lou
Yebo Feng
Chong Wang
Dinil M. Divakaran
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have shown promise in software vulnerability detection, particularly on function-level benchmarks like Devign and BigVul. However, real-world detection requires interprocedural analysis, as vulnerabilities often emerge through multi-hop function calls rather than isolated functions. While repository-level benchmarks like ReposVul and VulEval introduce interprocedural context, they remain computationally expensive, lack pairwise evaluation of vulnerability fixes, and explore limited context retrieval, limiting their practicality.We introduce JitVul, a JIT vulnerability detection benchmark linking each function to its vulnerability-introducing and fixing commits. Built from 879 CVEs spanning 91 vulnerability types, JitVul enables comprehensive evaluation of detection capabilities. Our results show that ReAct Agents, leveraging thought-action-observation and interprocedural context, perform better than LLMs in distinguishing vulnerable from benign code. While prompting strategies like Chain-of-Thought help LLMs, ReAct Agents require further refinement. Both methods show inconsistencies, either misidentifying vulnerabilities or over-analyzing security guards, indicating significant room for improvement.

View on arXiv
@article{yildiz2025_2503.03586,
  title={ Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories },
  author={ Alperen Yildiz and Sin G. Teo and Yiling Lou and Yebo Feng and Chong Wang and Dinil M. Divakaran },
  journal={arXiv preprint arXiv:2503.03586},
  year={ 2025 }
}
Comments on this paper