Skip to content
/ Wasp Public

An efficient architecture for labeling network packets

Notifications You must be signed in to change notification settings

AstroWLAN/Wasp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Abstract 💭

Wasp provides an architecture for efficiently labeling network packets in scenarios where inspecting every packet is impractical due to performance constraints

This research work is based on Yisroel Mirsky’s paper Kitsune and its corresponding repo

The following is an overview of the project structure

Wasp/
├── ANN             # KitNET implementation
├── Attacks         # Benchmark datasets and results
├── Kitsune         # Slightly modified version of the original Kitsune
├── ResearchTools   # Architecture implementations and supporting scripts
└── Resources       # Miscellaneous resources

Datasets 💾

The datasets used in our research are available at Kaggle
Since the datasets have different structures a pre-processing step is required to standardize the files for the simulation :

1️⃣ Run sanitizer.py to standardize the true labels .csv file

In most cases the labels are located in column index 1

To visualize the distribution of the ground-truth labels run plotter.py and provide the .csv file generated in the previous step

Make sure to exclude the packets used for neural network training

Simulation 🔬

Run simulation.py to launch the simulation interface : a menu with multiple options will appear
Follow the steps below in the exact order to replicate our results

Each step corresponds to a menu option that must be selected

1️⃣ KitNET
Executes the vanilla KitNET model to generate an array of predictions. You’ll be prompted to provide the path to a .pcap file for analysis and a .csv file containing the ground-truth labels

In order to ensure that the number of generated predictions matches the number of pre-computed labels

2️⃣ Architecture Benchmarks
Runs benchmarks for the two architectures : Naive Sampling and Wasp Detection. You’ll need to provide the .csv file with ground-truth labels and the predictions file generated by KitNET in the previous step

Each system is evaluated across multiple sampling rates. For every rate 300 experiments are conducted to account for the probabilistic nature of the process, gather sufficient data, reduce sample variance and ensure statistical reliability.
The results are then averaged

After completion multiple graphs are generated to visualize the benchmark results