Detecting Code Vulnerabilities with Heterogeneous GNN Training

24 February 2025

Abstract

Detecting vulnerabilities in source code is a critical task for software security assurance. Graph Neural Network (GNN) machine learning can be a promising approach by modeling source code as graphs. Early approaches treated code elements uniformly, limiting their capacity to model diverse relationships that contribute to various vulnerabilities. Recent research addresses this limitation by considering the heterogeneity of node types and using Gated Graph Neural Networks (GGNN) to aggregate node information through different edge types. However, these edges primarily function as conduits for passing node information and may not capture detailed characteristics of distinct edge types. This paper presents Inter-Procedural Abstract Graphs (IPAGs) as an efficient, language-agnostic representation of source code, complemented by heterogeneous GNN training for vulnerability prediction. IPAGs capture the structural and contextual properties of code elements and their relationships. We also propose a Heterogeneous Attention GNN (HAGNN) model that incorporates multiple subgraphs capturing different features of source code. These subgraphs are learned separately and combined using a global attention mechanism, followed by a fully connected neural network for final classification. The proposed approach has achieved up to 96.6% accuracy on a large C dataset of 108 vulnerability types and 97.8% on a large Java dataset of 114 vulnerability types, outperforming state-of-the-art methods. Its applications to various real-world software projects have also demonstrated low false positive rates.

View on arXiv

@article{luo2025_2502.16835,
  title={ Detecting Code Vulnerabilities with Heterogeneous GNN Training },
  author={ Yu Luo and Weifeng Xu and Dianxiang Xu },
  journal={arXiv preprint arXiv:2502.16835},
  year={ 2025 }
}

Comments on this paper