NeuFSD: Rethinking Flow Size Distribution via Neural Decoding

This repository contains the official implementation for the paper "NeuFSD: Rethinking Flow Size Distribution via Neural Decoding".

Overview

NeuFSD is a novel approach that leverages neural decoding to accurately estimate the Flow Size Distribution (FSD) in network traffic. This project provides the code to preprocess datasets, train the NeuFSD model, evaluate its performance, and visualize the results.

Quick Start

For a quick demonstration, you can use our preprocessed dataset.

Clone this repository:

git clone https://github.com/shihisjsjjsjks/NeuFSD.git
cd NeuFSD

Download the preprocessed data: Download the preprocessed dataset from our Releases page. The archive should contain two folders: caida_org and EL.

Place the dataset: Unzip the downloaded archive and move the folders into the correct directory as shown below. The final structure should look like this:

NeuFSD/
└── 64_64_counter_time/
    └── caida_org_exp/
        ├── caida_org/  <-- Place the 'caida_org' folder here
        ├── EL/         <-- Place the 'EL' folder here
        ├── run.sh
        ├── plot.py
        └── ...

Run the experiment: Navigate to the experiment directory and execute the main script. This will preprocess the data, train the model, and run the evaluation.
```
cd 64_64_counter_time/caida_org_exp
bash run.sh
```
Visualize the results: After the script finishes, you can visualize the results. See the Visualizing Results section for details on configuring the plot script.
```
python plot.py
```

Data Preparation

If you want to use your own data or process the raw CAIDA traces, follow these steps.

1. Raw CAIDA Dataset (`caida_org`)

The project requires the raw CAIDA "equinix-nyc" traces.

Request Access: First, you must apply for access to the CAIDA dataset.
Download: You can download the datasets (e.g., caida2016, caida2018) from the CAIDA Data Portal.
Placement: After downloading, place the raw PCAP files into the 64_64_counter_time/caida_org_exp/caida_org/ directory.

2. Hot Elements Dataset (`EL`)

The EL dataset contains the "hot part" (heavy hitters) of the network traffic, extracted using the Elastic Sketch method.

Generation: You need to generate this dataset yourself from the same CAIDA traces used for caida_org.
Method: The generation is based on the following paper:

Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. 2018. Elastic sketch: Adaptive and fast network wide measurements. in ACM SIGCOMM (2018).
Placement: Place the generated EL files into the 64_64_counter_time/caida_org_exp/EL/ directory. The run.sh script includes the logic to process these raw datasets and generate the necessary files for training and evaluation.

Visualizing Results

The plot.py script is used to generate plots from the evaluation results. You must manually configure it to match the output files generated by run.sh.

After running bash run.sh, check the output directory or the script's log to find the names of the generated prediction files (e.g., preds_0.034962, preds_0.089646).

Open plot.py and modify the following section with your specific values:

# --- Modify this section to visualize your results ---
comb_name = 1
model_name = 'ViT'
train_num = 2
if comb_name == 1:
    # Update these values based on the prediction files generated by your experiment
    value1 = 'preds_0.034962' 
    value2 = 'preds_0.089646'
    
    value1 = value1.replace('preds_', '')
    value2 = value2.replace('preds_', '')
    
    main(comb_name, model_name, train_num, value1, value2)

Run the script to generate the visualization:
```
python plot.py
```

Citation

If you use this code or our work in your research, please cite our paper:

@inproceedings{neufsd2024,
  title={NeuFSD: Rethinking Flow Size Distribution via Neural Decoding},
  author={Qilong Shi, Pinze Ren, Dayu Wang, Tong Yang, Yangyang Wang and Mingwei Xu},
  booktitle={Your Conference/Journal},
  year={2024}
}

Also, please cite the Elastic Sketch paper if you use the data preprocessing pipeline:

@inproceedings{yang2018elastic,
  title={Elastic sketch: Adaptive and fast network wide measurements},
  author={Yang, Tong and Jiang, Jie and Liu, Peng and Huang, Qun and Gong, Junzhi and Zhou, Yang and Miao, Rui and Li, Xiaoming and Uhlig, Steve},
  booktitle={Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication},
  pages={561--575},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
128_128_counter_time		128_128_counter_time
64_64_counter_time		64_64_counter_time
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuFSD: Rethinking Flow Size Distribution via Neural Decoding

Overview

Quick Start

Data Preparation

1. Raw CAIDA Dataset (`caida_org`)

2. Hot Elements Dataset (`EL`)

Visualizing Results

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuFSD: Rethinking Flow Size Distribution via Neural Decoding

Overview

Quick Start

Data Preparation

1. Raw CAIDA Dataset (caida_org)

2. Hot Elements Dataset (EL)

Visualizing Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Raw CAIDA Dataset (`caida_org`)

2. Hot Elements Dataset (`EL`)

Packages