ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.12575
  4. Cited By
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities
  (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks
v1v2 (latest)

LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks

19 December 2023
Saad Ullah
Mingji Han
Saurabh Pujar
Hammond Pearce
Ayse K. Coskun
Gianluca Stringhini
    ELMLRM
ArXiv (abs)PDFHTML

Papers citing "LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks"

50 / 59 papers shown
Title
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
Abhijeet Pathak
Suvadra Barua
Dinesh Gudimetla
Rupam Patir
Jiawei Guo
Hongxin Hu
Haipeng Cai
ELM
60
0
0
24 Nov 2025
LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework
LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework
XiangRui Zhang
Zeyu Chen
Haining Wang
Qiang Li
32
0
0
23 Nov 2025
VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization
VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization
Youpeng Li
Fuxun Yu
Xinda Wang
OffRL
88
0
0
14 Nov 2025
One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery
One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery
Qiushi Wu
Yue Xiao
Dhilung Kirat
Kevin Eykholt
Jiyong Jang
D. Schales
76
0
0
15 Oct 2025
MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement
MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement
Youpeng Li
Kartik Joshi
Xinda Wang
Eric Wong
61
1
0
30 Sep 2025
LLM-based Vulnerability Discovery through the Lens of Code Metrics
LLM-based Vulnerability Discovery through the Lens of Code Metrics
Felix Weissberg
Lukas Pirch
Erik Imgrund
Jonas Moller
Thorsten Eisenhofer
Konrad Rieck
52
1
0
23 Sep 2025
Adversarially Robust Assembly Language Model for Packed Executables Detection
Adversarially Robust Assembly Language Model for Packed Executables Detection
Shijia Li
Jiang Ming
Lanqing Liu
Longwei Yang
Ni Zhang
Chunfu Jia
72
0
0
19 Sep 2025
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs
Saad Ullah
Praneeth Balasubramanian
Wenbo Guo
Amanda Burnett
Hammond Pearce
Christopher Kruegel
Giovanni Vigna
Gianluca Stringhini
121
3
0
01 Sep 2025
LLM-driven Provenance Forensics for Threat Investigation and Detection
LLM-driven Provenance Forensics for Threat Investigation and Detection
Kunal Mukherjee
Murat Kantarcioglu
48
2
0
29 Aug 2025
LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres
LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres
Ronal Singh
Shahroz Tariq
Fatemeh Jalalvand
Mohan Baruwal Chhetri
Surya Nepal
Cécile Paris
Martin Lochner
153
3
0
26 Aug 2025
A Guide to Stakeholder Analysis for Cybersecurity Researchers
A Guide to Stakeholder Analysis for Cybersecurity Researchers
James C. Davis
Sophie Chen
Huiyun Peng
Paschal C. Amusuo
Kelechi G. Kalu
56
3
0
20 Aug 2025
Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models
Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models
Mohammed Sayagh
Mohammad Ghafari
AAML
74
0
0
02 Aug 2025
Revisiting Pre-trained Language Models for Vulnerability Detection
Revisiting Pre-trained Language Models for Vulnerability Detection
Youpeng Li
Weiliang Qi
Xuyu Wang
Fuxun Yu
Xinda Wang
AAML
160
1
0
22 Jul 2025
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
Matous Kozak
Roshanak Zilouchian Moghaddam
Siva Sivaraman
LLMAGELM
153
0
0
12 Jul 2025
SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis
SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis
Wang Lingxiang
Quanzhi Fu
Wenjia Song
Gelei Deng
Yi Liu
Dan Williams
Ying Zhang
112
1
0
21 Jun 2025
Growing with Experience: Growing Neural Networks in Deep Reinforcement Learning
Growing with Experience: Growing Neural Networks in Deep Reinforcement Learning
Lukas Fehring
Marius Lindauer
Theresa Eimer
OffRL
111
1
0
13 Jun 2025
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
Reza Fayyazi
Michael Zuzak
S. Yang
183
1
0
12 Jun 2025
SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows
SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows
Rebecca Saul
Hao Wang
Koushik Sen
David Wagner
LLMAG
150
1
0
08 Jun 2025
VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries
VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries
Nasir Hussain
Haohan Chen
Chanh Tran
Philip Huang
Zhuohao Li
Pravir Chugh
William Chen
Ashish Kundu
Wei Bai
172
3
0
28 May 2025
SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis
SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability AnalysisIEEE Symposium on Security and Privacy (S&P), 2025
Yansong Li
Paula Branco
Alexander M. Hoole
Manish Marwah
Hari Manassery Koduvely
Guy-Vincent Jourdan
Stephan Jou
ELMLRM
126
6
0
27 May 2025
Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques
Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques
Avinash Patil
291
2
0
19 May 2025
Automated Profile Inference with Language Model Agents
Automated Profile Inference with Language Model Agents
Yuntao Du
Zitao Li
Bolin Ding
Yaliang Li
Hanshen Xiao
Jingren Zhou
Ninghui Li
LLMAG
245
2
0
18 May 2025
Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents
Ratnadira Widyasari
Martin Weyssow
Ivana Clairine Irsan
Han Wei Ang
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
Hong Jin Kang
David Lo
LLMAG
251
6
0
16 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAMLELM
315
0
0
12 May 2025
AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities
AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities
Minjae Seo
Wonwoo Choi
Myoungsung You
Seungwon Shin
KELM
149
1
0
07 May 2025
Automatically Generating Rules of Malicious Software Packages via Large Language Model
Automatically Generating Rules of Malicious Software Packages via Large Language ModelDependable Systems and Networks (DSN), 2025
XiangRui Zhang
HaoYu Chen
YongZhong He
Wenjia Niu
Qiang Li
153
1
0
24 Apr 2025
Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach
Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach
Penghui Li
Songchen Yao
Josef Sarfati Korich
Changhua Luo
Jianjia Yu
Yinzhi Cao
Junfeng Yang
890
5
0
22 Apr 2025
Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability Prediction
Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability Prediction
Felix Mächtle
Nils Loose
Tim Schulz
Florian Sieck
Jan-Niclas Serr
Ralf Möller
T. Eisenbarth
199
2
0
18 Apr 2025
The Digital Cybersecurity Expert: How Far Have We Come?
The Digital Cybersecurity Expert: How Far Have We Come?IEEE Symposium on Security and Privacy (S&P), 2025
Dawei Wang
Geng Zhou
Xianglong Li
Yu Bai
Li Chen
Ting Qin
Jian Sun
Didong Li
ELM
211
1
0
16 Apr 2025
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors
Francesco Marchiori
Denis Donadel
Mauro Conti
218
2
0
14 Apr 2025
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
Martin Weyssow
Chengran Yang
Junkai Chen
Ratnadira Widyasari
Ting Zhang
...
Ang Han Wei
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
David Lo
LRM
376
4
0
07 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Wenbo Guo
Tianneng Shi
Yu Yang
Andy Zhang
Patrick Gage Kelley
Kurt Thomas
Dawn Song
Dawn Song
375
18
0
07 Apr 2025
Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting
Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting
Wan Tian
Zhongfeng Qin
AI4TS
181
0
0
04 Apr 2025
Reasoning with LLMs for Zero-Shot Vulnerability Detection
Reasoning with LLMs for Zero-Shot Vulnerability Detection
Arastoo Zibaeirad
Marco Vieira
AAMLLRM
176
8
0
22 Mar 2025
Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets
Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets
Hamed Jelodar
Mohammad Meymani
Roozbeh Razavi-Far
202
15
0
21 Mar 2025
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
Adam Štorek
Mukur Gupta
Noopur Bhatt
Aditya Gupta
Janie Kim
Prashast Srivastava
Suman Jana
AAML
473
3
0
18 Mar 2025
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview
Norbert Tihanyi
Tamás Bisztray
M. Ferrag
Bilel Cherif
Richard A. Dubniczky
Ridhi Jain
Lucas C. Cordeiro
176
7
0
13 Mar 2025
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots
Xiaoqun Liu
Jiacheng Liang
Qiben Yan
Muchao Ye
Jinyuan Jia
Zhaohan Xi
Jinyuan Jia
Zhaohan Xi
262
0
0
28 Feb 2025
Standard Benchmarks Fail - Auditing LLM Agents in Finance Must Prioritize Risk
Standard Benchmarks Fail - Auditing LLM Agents in Finance Must Prioritize Risk
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
363
2
0
21 Feb 2025
Do LLMs Consider Security? An Empirical Study on Responses to Programming Questions
Do LLMs Consider Security? An Empirical Study on Responses to Programming QuestionsEmpirical Software Engineering (EMSE), 2025
Amirali Sajadi
Binh Le
A. Nguyen
Kostadin Damevski
Preetha Chatterjee
252
9
0
20 Feb 2025
LAMD: Context-driven Android Malware Detection and Classification with LLMs
LAMD: Context-driven Android Malware Detection and Classification with LLMs
Xingzhi Qian
Xinran Zheng
Yiling He
Shuo Yang
Lorenzo Cavallaro
411
22
0
18 Feb 2025
Large Language Models for In-File Vulnerability Localization Can Be "Lost in the End"
Large Language Models for In-File Vulnerability Localization Can Be "Lost in the End"
Francesco Sovrano
Adam Bauer
Alberto Bacchelli
234
4
0
09 Feb 2025
Can LLM Generate Regression Tests for Software Commits?
Jing Liu
Seongmin Lee
Eleonora Losiouk
Marcel Böhme
145
4
0
19 Jan 2025
Logic Meets Magic: LLMs Cracking Smart Contract Vulnerabilities
Logic Meets Magic: LLMs Cracking Smart Contract VulnerabilitiesInternational Conference on Blockchain (ICB), 2025
ZeKe Xiao
Qin Wang
Hammond Pearce
Shiping Chen
208
5
0
13 Jan 2025
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs
Reza Fayyazi
Stella Hoyos Trueba
Michael Zuzak
S. Yang
226
4
0
22 Oct 2024
From Solitary Directives to Interactive Encouragement! LLM Secure Code
  Generation by Natural Language Prompting
From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting
Shigang Liu
Bushra Sabir
Seung Ick Jang
Yuval Kansal
Yansong Gao
Kristen Moore
A. Abuadbba
Surya Nepal
215
4
0
18 Oct 2024
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yuzhou Nie
Yuzhou Nie
Yu Yang
Ruizhe Jiang
Yuheng Tang
Xander Davies
Basel Alomair
Bo Li
Wenbo Guo
Dawn Song
ELM
201
23
0
14 Oct 2024
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning
Arshiya Khan
Guannan Liu
Xing Gao
KELM
205
4
0
27 Sep 2024
VulnLLMEval: A Framework for Evaluating Large Language Models in
  Software Vulnerability Detection and Patching
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching
Arastoo Zibaeirad
Marco Vieira
161
14
0
16 Sep 2024
Enhancing Source Code Security with LLMs: Demystifying The Challenges
  and Generating Reliable Repairs
Enhancing Source Code Security with LLMs: Demystifying The Challenges and Generating Reliable Repairs
Nafis Tanveer Islam
Joseph Khoury
Andrew Seong
E. Bou-Harb
Peyman Najafirad
AAML
253
3
0
01 Sep 2024
12
Next