This project explores evaluating various deep learning-based cross-project vulnerability detection methods. Methods from different research papers will be replicated to provide a baseline for our evaluation framework. Our framework will be reproducible and can be adopted into future research for determining the optimum vulnerability detection method. Research papers(scope), source code, and feature extraction tools will be provided within this repository.
This project explores using function-level vulnerability discovery within a cross-project scope. The AST representation will be the training data used for a bidirectional LSTM neural network. Typical Recurrent neural networks have difficulties in capturing long-term dependencies regarding continuous and fragment elements associated with vulnerability therefore the method combines RNN with LSTM cells to handle the vulnerabilities with long-term dependencies spanning multiple lines of code. The function level representation model machine learning model has demonstrated significant performance gains
Understand is a commercial code enhancement tool for extracting function-level code metrics.
Source: https://www.scitools.com/
CodeSensor is a robust code to Abstract Syntax Tree(AST) parser implemented by based on the concept of island grammars.
Source: https://github.com/fabsx00/codesensor
Dual-component Deep domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection
To address the issue concerning the scarcity of labeled vulnerabilities in data sets used for software vulnerability detection, a deep domain adaptation soltuion is proposed. Using deep domain adaptation labelled vulnerability representations from a source dataset could be transfered to an unlabelled target dataset. this paper proposes an Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) architecture for handling transfer learning from a labelled source to unlabelled target dataset.
Source code: https://github.com/vannguyennd/dual-dan?fbclid=IwAR2fcuVwPvZlCtvUY3c_Bis0q6NND1jl5GS5UUMxvR9shOlef0doDzRUENk
Joern will be utilized to analyze the source codes to get user-defined variables and functions.
Source: [https://joern.readthedocs.io/en/latest/]
126 types of vulnerabilities within source code, were collected from the National Vulnerability Database (NVD) and the Software Assurance Reference Dataset (SARD). The NVD data set contains vulnerabilities from 29 open-source software projects. The SARD dataset contains 13,906 vulnerable C/C++ programs out of a total of 14,000. Vulnerable data representations are aimed to accommodate both syntax and semantic information by introducing the notion of:
- Syntax-based Vulnerability Candidates (SyVCs)
- Semantics-based Vulnerability Candidates (SeVCs)
A program as a whole is divided into statements that correspond to “region proposals” and display the syntax and semantics characteristics of vulnerabilities. SyVCs are representative of vulnerability syntax characteristics and are extended upon by SeVCs for accommodating the semantic information due to the presence of data dependency and control dependency.
Vulnerable data: https://github.com/SySeVR/SySeVR/tree/master/Program%20data
In order to compare the performance of each model horizontally, we introduced five performance metrics: FNR, FPR, Recall, Precision and F1-score. The parameters of these performance indicators can be calculated through the confusion matrix of the model.
FNR (False Negative Rate, best = 0): FNR represents that in the positive class, how many samples are predicted to be the negative class which is an error rate. In the project, the FNR demonstrates that the rate of non-vulnerable functions has been identified as vulnerable.
FPR (False Positive Rate, best = 0): FPR represents that in the negative class, how many samples are predicted to be a positive class which is an error rate. In the project, FPR represents that the rate of the vulnerable function has been identified as non-vulnerable. Therefore, it directly demonstrates the capability of the model to correctly find out the vulnerable functions. FPR is one of our key performance indicators.
Precision (best = 1): Model precision score represents the rate of the model correctly classifying the positive sample out of all the positive predictions made.
Recall (best = 1): Model recall score represents the model’s ability to correctly predict the positives out of actual positives. The higher the recall score, the better the machine learning model is at identifying both positive and negative classes.
F1-Score (best = 1): Model F1 score represents the model score as a function of precision and recall score. F1-score is a machine learning model performance metric that gives equal weight to both the Precision and Recall for measuring its performance in terms of accuracy. It’s often used as a single value that provides high-level information about the model’s output quality. Therefore, F1-score is one of our key performance indicators.