-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDP_Pavel_Kachalouski_2012.tex
1315 lines (1041 loc) · 103 KB
/
DP_Pavel_Kachalouski_2012.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% options:
% thesis=B bachelor's thesis
% thesis=M master's thesis
% czech thesis in Czech language
% english thesis in English language
\documentclass[thesis=M,english]{FITthesis}[2011/07/15]
\usepackage[utf8]{inputenc} % LaTeX source encoded as UTF-8
% \usepackage[latin2]{inputenc} % LaTeX source encoded as ISO-8859-2
% \usepackage[cp1250]{inputenc} % LaTeX source encoded as Windows-1250
\usepackage{graphicx} %graphics files inclusion
\usepackage{algorithmic} %subfigures
% \usepackage{amsmath} %advanced maths
% \usepackage{amssymb} %additional math symbols
% % list of acronyms
% \usepackage[acronym,nonumberlist,toc,numberedsection=autolabel]{glossaries}
% \iflanguage{czech}{\renewcommand*{\acronymname}{Seznam pou{\v z}it{\' y}ch zkratek}}{}
% \makeglossaries
\usepackage{listings}
\usepackage{color}
\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}
\definecolor{lightgray}{rgb}{0.93,0.93,0.93}
\lstset{frame=none,
language=C,
aboveskip=3mm,
belowskip=3mm,
showstringspaces=false,
columns=flexible,
basicstyle={\small\ttfamily},
numbers=left,
numberstyle=\tiny\color{gray},
keywordstyle=\color{blue},
commentstyle=\color{dkgreen},
stringstyle=\color{mauve},
backgroundcolor=\color{lightgray},
breaklines=true,
breakatwhitespace=true
tabsize=3
}
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% EDIT THIS
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
\department{Department of Computer Systems}
\title{Network intrusion detection system accelerated with GPU.}
\author{Pavel Kachalouski} %author's name without academic degrees
\authorWithDegrees{Bc. Pavel Kachalouski} %author's name with academic degrees
\supervisor{Šimeček Ivan Ing., Ph.D.}
\acknowledgements{}
\abstractEN{The goal of this thesis is to develop and test an application for the network intrusion detection accelerated with the GPU. A deep packet inspection is a heavy process and consumes a lot of CPU calculation resources, thereby the main application objective is to offload the heaviest operations during analysing network traffic to the GPU. The project is based on Nvidia CUDA architecture and uses standard features provided by CUDA devices with 1.1 version of compute capabilities.}
\abstractCS{Cílem této práce je vývoj a testování aplikaci pro rychlou a včasnou detekci průniků sítě s pomoci GPU. Hloubkové prověřování paketů je složíty proces který konzumuje velké množství výpočetních prostředků procesoru, a proto hlavním cílem aplikace je delegováni nejtěžších operaci v analýze síťového provozu na GPU. Projekt je založen na architektuře Nvidia CUDA a používá standardní funkce poskytované CUDA zařízení s verzí 1.1 výpočetních schopností.}
\placeForDeclarationOfAuthenticity{Prague}
\keywordsCS{Síťová analýza, Detekce průniku, GPGPU, CUDA, Paralelní výpočty.}
\keywordsEN{Network analysis, Intrusion detection, GPGPU, CUDA, Parallel computing.}
\parskip 2mm
\setcounter{tocdepth}{5}
\begin{document}
% \newacronym{CVUT}{{\v C}VUT}{{\v C}esk{\' e} vysok{\' e} u{\v c}en{\' i} technick{\' e} v Praze}
% \newacronym{FIT}{FIT}{Fakulta informa{\v c}n{\' i}ch technologi{\' i}}
\chapter{Introduction}
\label{chap:introduction}
\section{Motivation}
First major malicious internet-based attack was introduced in 1988 when one of the first Internet worms was released. Referred to as "The Morris Worm" \cite{morris_worm}, it was written by Robert Tappan Morris and caused major interruptions across large parts of the Internet. Since 1988 Internet has greatly evolved and as a result increased number of internet-based attacks and network security became a really important issue. As an attempt to overcome this problem Intrusion Detection systems were developed. Such systems allow to monitor and possibly prevent attempts to intrude into or compromise the network resources.
In short it works as follows: you have a computer system. Your computer system is attached to the network and may be even to the Internet. You are willing to allow access from the network, by authorized people, but you are not however willing to allow unauthorized access to the system by anyone else. Typically, a firewall or authentication system is used to prevent unauthorised access. However, sometimes authentication systems and firewalling can be broken. Intrusion Detection is a set of mechanisms that are helps to detect unauthorised access to the system. Also Intrusion Detection systems may take some steps to deny access for intruders. Such systems are placed between firewall and system being secured and provide an extra layer of protection to that system.
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/Typical_NIDS.png}
\caption{Typical NIDS scenario. Based on \cite{iatac_tools_report}}
\label{fig:typical_nids}
\end{figure}
Intrusion detection systems fall into two categories:
\begin{itemize}
\item Network based system.
Such types of systems are placed in the network near the system that should be monitored. They monitor network traffic and examine whether it is within acceptable boundaries.
\item Host based systems.
Such types of detection systems are run on the system being monitored and determine if activity on the system is acceptable.
\end{itemize}
There are also more recent type of intrusion detection system which reside in the kernel of operation system and monitor on low-level system activity.
Most modern network intrusion detection and prevention systems use a set of rules that are compared to incoming and outcoming packets \cite{symantec_nids_intro}. Usually, a rule consists of a filter based on packet header information, a data that should be contained in the packet payload and an associated action that is taken in case filter and payload matching conditions are met.
Matching payload signatures is a resource-intensive process, accounting for about 75\% of the total CPU processing time of modern Network Intrusion Detection systems. The reason why is it so intensive is in the fact that most of the time, every byte of every network packet should be processed as a part of data searching algorithm that searches for matches along a large set of rules that apply for particular packet. As an example, most widely used open-source NIDS Snort has rule set with about 10000 strings. Searching every packet for all of this strings needs a lot of resources.
\begin{figure}[h]
\centering
\includegraphics[scale=0.3]{images/web-attacks.png}
\caption{Average Web-based attacks per day, by month, 2009-2010. Source: Symantec Corporation.}
\label{fig:symantec_report}
\end{figure}
With increasing number of internet-based attacks rules datasets grow every year creating additional load on NIDS systems. Figure \ref{fig:symantec_report} shows a general trend of the number of web-based attacks on web servers taken from Symantec Corporation report.
The other source of high load on NIDS is constant grow of networks speeds. In the past 6 years, computer networks were characterized by a steady increase in the transfer rate, complexity and number of networked users. In 2009 the Internet traffic growth all over the world amounted to 40-50 \%, and according to Cisco company forecast \cite{cisco_firecast} global IP traffic will increase in the next 5 years, reaching by 2015 about 1 zettabyte of data transferred during one year.
Problem with high load on NIDS can be solved by using specialized network processing devices such as FPGA, ASIC, Network Processing Unit (NPU), or the TCP Offoading Engine (TOE). Nonetheless, the solutions may not be cost effective and easy to manage because they require special hardware and configurations.
\section{Problem definition}
A quick packet processing in a NIDS over huge amount of data obtained from the network \textbf{is a must}. Perfomance of network data processing algorithms should be fast and should not interfere with the overall network perfomance (or interfere as little as possible).
The current factors related to the networks and traffic analysis systems listed below:
\begin{enumerate}
\item The number of network nodes and users are increasing.
\item Network speed (bit rate) is gradually increasing.
\item The amount of network traffic is increasing heavily.
\item Analysis algorithms are getting more complex.
\item Databases of known threats grow resulting in high number of rules for matching.
\end{enumerate}
This thesis approach is to use heterogeneus computing, and more specifically use \textbf{graphic processing units} to perform network data analysis.
Heterogeneous computing is a way of computing some tasks using systems made up by different types of computational units. Computational unit types could be: a general-purpose processors (GPP), also known as central processor units (CPUs), which are usually the main processor of the most of the computing systems, and special-purpose processors (SPP). SPP are, for example, digital signal processors (DSP) or graphics processor units (GPU).
Graphics Processing Unit or GPU is essentially a dedicated hardware device that is responsible for translating data into a 2D image formed by pixels. GPU evolution came from the growing popularity and complexity of rendering programs like CAD (Computer Aided Design) programs and 3D video games. These software required high computing capabilities and GPUs gave such a possibility to them. GPUs have not only a highgly parallel processor structure and are able to execute many threads of the code concurrently, but also a high speed memory.
The interest to use GPUs to perform calculation tasks that are not suitable for GPUs began to grow in 2005. Developers and scientists were trying to use parallel computers and high memory bandwidth to improve complex algorithms perfomance. GPUs are used for computations in molecular dynamics, quantum chemistry, bioinformatics, computational fluid dynamics, weather and climate forecasting and more other fields.
GPGPU (general-purpose computing on graphics processing units) at the beginning used GPUs as they were computing graphics tasks. Input data were encoded into image, then by using graphics libraries some operations were perfomed with an image and result data were decoded from image to its original form. Later companies realised that GPGPU could be an advantage for their business and invested in developing tools to make easier to use their products. NVIDIA developed CUDA (an acronym for Compute Unified Device Architecture) version 1.0 library in 2006, which enabled to run CUDA code written in special manner on some of their GPUs, while AMD developed ATI Stream technology that uses OpenCL language to write code executed on GPUs.
CUDA is the computing engine in Nvidia GPUs that is accessible to software developers through variants of industry standard programming languages. Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions) \cite{cuda_guide}, compiled through a PathScale Open64 C compiler to code algorithms for execution on the GPU.
The main objective of this thesis is to develop open-source NIDS application which will inspect network traffic using classic CPU mode and heterogeneous CPU+GPU mode and compare perfomance of the two modes.
In additional, the software should fulfill the following requirements:
\begin{itemize}
\item Open source. The application should be developed under the terms of open source software.
\item Application should be developed in C/C++ and C for CUDA languages.
\item Easy to use. Application should be easy to use for the users. NIDS rules should have easy format.
\item Application will analyze only TCP/IP traffic due to simplifications on the packet decomposition level.
\item Logging and offline analysis. Application should be able to store traffic for the future analysis in offline mode.
\end{itemize}
\section{Project overview}
The general application workflow is shown in the figure \ref{fig:app_workflow}.
\begin{figure}[h]
\centering
\includegraphics[scale=0.4]{images/workflow.png}
\caption{Application workflow.}
\label{fig:app_workflow}
\end{figure}
The main stages of the workflow:
\begin{itemize}
\item \textbf{Packet capture}. Application captures traffic on one of the network interfaces for later analysis. Raw packet should be obtained from network for full analysis of route information on the next stage.
\item \textbf{Route information analysis}. Every packet is analyzed by means of type of the protocols used on network and transport layers and also destination and source IP address. If packet route information matches to any rule from the dataset it transported to the next stage of the workflow.
\item \textbf{Payload analysis}. Application looks for a string or regex in the packet payload.
\item \textbf{Writing result to a database}. On this stage all matched packets from the previous stage are written to a database with a message obtained from rule with which packet was matched.
\end{itemize}
\chapter{Background}
\section{Traffic analyzing: sniffers.}
Packet sniffer is a program or library which allows to intercept data packets transmitted over a certain network to which the system is connected throught a network card. Most popular known sniffing software is tcpdump and Wireshark. These programs are used for obtaining raw network packets and performing analyzing over obtained packets in order to represent them in a some visual form or dump to a file. There are also pure sniffing libraries, such as Libpcap (for UNIX like OS) and Winpcap (for Windows OS) \cite{cap_app}.
\subsection{How they work.}
Most of the network cards, support what is known as promiscuous mode or monitor mode. Normal operation of the network card is to receive packet from the network and compare the destination layer 2 (link layer) address (also known as MAC address) to the one used by the network card. If packet destination address and network card address match or packet destination address is a broadcast address, packet is transferred to the operating system. Not matched packets are dropped.
In case promiscuous or monitor mode is enabled the behavior is slightly different. In this mode network card transfers all packets received from the network to the operating system, even if their address doesn't match to the network card's one. Operating system later uses packet filtering engine to pass packets to the applications. Enabling promiscuous mode requires root privileges under Linux operating systems.
There are also differences in capturing of the network packets in a different types of networks and different network topologies. The following subsection explain some details about sniffing in the 802.3 types of the networks and 802.11 type of the networks.
\subsection{Packet capture in IEEE 802.3 networks.}
In a typical IEEE 802.3 LAN network \cite{ieee_802_3}, a star topology is used, so all the nodes in the network are connected (through their own cable) to either a hub or a switch.
\begin{figure}[h]
\centering
\includegraphics[scale=0.45]{images/IEEE802_3_LAN.png}
\caption{IEEE 802.3 networks usually use star topology.}
\label{fig:ieee802.3star_topology}
\end{figure}
Hubs are basically repeaters: signal introduced at the input of any port appears at the output of every port except the original incoming. Switches work in a different way: they identify all the hosts connected to the port. When packet is received on some port switch passes that packet only to the port where destination host is connected. Switched networks has better perfomance, because they do not multiply packets when they are passed to the output port. Switches also may filter traffic on different protocol levels (link, transport, network, etc.).
For the packet sniffer it means that only packets transferred to or from particular host running the sniffer and broadcast packets will be captured. To overcome this problem several techniques may be used\cite{gpu_framework}:
\begin{itemize}
\item Using a hub: a simple but not good solution is to replace switches by hubs. This solution is not good because network perfomance built with hubs is very reduced compared to the network built with switches. Also hubs production is practically discontinued.
\item Put the sniffer in a gateway link as a router: this technique has some advantages and disadvantages at the same time. The advantage is that such a sniffer is able to capture packets from a lot of subnetworks. The disadvantage is that the sniffer captures traffic going only through the link where it was placed. Internal traffic (between subnetworks or between nodes of the same subnetwork) is not captured, which is important in some cases, like data centers \cite{act_defense_model}.
\begin{figure}[h]
\centering
\includegraphics[scale=0.6]{images/gateway_sniffering.png}
\caption{Sniffering traffic in the gateway links.}
\label{fig:gateway_sniffering}
\end{figure}
In case sniffer in a gateway link doesn't satisfy requirements port mirroring or distributed sniffer could be used. Figure \ref{fig:gateway_sniffering} illustrates this technique with an example.
\item Switch port mirroring: some switches have port mirroring feature or monitoring port 2. When port mirroring is enabled, all packets passed over the switch are transmitted to the mirroring port. If the network contain several switches, obtaining packets on a single host could be more complex and require advanced switch features like Cisco RSPAN \cite{cisco_site}. Also distributed sniffer in such case could be used.
\begin{figure}[h]
\centering
\includegraphics[scale=0.6]{images/mirror_sniffering.png}
\caption{Sniffering traffic using switch port mirroring.}
\label{fig:mirror_sniffering}
\end{figure}
\item Distributed sniffer: distributed sniffers gather traffic from several hosts using a specialized combination of hardware and software and send it to the unique host. Such type of the sniffer is very flexible and scalable. But there are some drawbacks in this approach. The major drawback is less perfomance comparing to the previous type of the sniffers, caused by increased network traffic due to using software architecture. Distributed sniffer architecture is shown in the figure \ref{fig:distributed_sniffering}.
\begin{figure}[h]
\centering
\includegraphics[scale=0.6]{images/distributed_sniffering.png}
\caption{Distributed sniffer.}
\label{fig:distributed_sniffering}
\end{figure}
\end{itemize}
\subsection{Packet capture in IEEE 802.11 networks.}
The main advantage of the packet sniffering in the IEEE 802.11 network compared to the IEEE 802.3 is that 802.11 networks all share the same carrier for transferring packets \cite{ieee_802_11}. So the only hardware required to capture packets is a network card in promiscuous mode. However, some considerations have to be kept in mind. IEEE 802.11 network could be made up by a number of access points which could be a problem for capturing packets. The coverage area becomes bigger therefore higher reception antenna is needed if one sniffer point is used. Also some environment conditions and sniffer physical position could have big influence on the quality of sniffering. Because of shadowing packet sent by some host could be lost at all.
Some approaches to solve these problems are\cite{gpu_framework}:
\begin{itemize}
\item Move the capture point to the wired network section. If it's possible, sometimes is better to capture packets in the wired section of the network instead of capturing them in the wireless section. Such approach is similar to the one listed before: placing sniffer in the gateway link, so the disatvantage is the same. Also the drawback of this approach is that internal wireless traffic could not be captured.
\item Distributed sniffer: the same method as listed above with the same advantages and disatvantages.
\end{itemize}
\subsection{Libpcap.}
Libpcap is a sniffer library for packet capturing in a Unix-like systems. For a Windows systems a port of Libpcap named Winpcap \cite{winpcap_site} exist. The library provides an API for using kernel resources for packet capturing in kernel architectures that are based on Berkeley Packet Filter \cite{bsd_filter}. Libpcap was originally developed by the tcpdump developers in the Network Research Group at Lawrence Berkeley Laboratory. The following features are supported by Libpcap: packet capture, saving captured packets to a file, reading files containing saved packets. A capture file saved in the format that Libpcap can be read by applications such as tcpdump, Wireshark, CA NetMaster.
\subsection{Network traffic analysis techniques.}
This section gives a brief information about network traffic monitoring and analysis process. In the network analysis process a network data is intercepted and examined in order to retrieve information from patterns in communication. Packets collected from the network are called a dataset and contain the raw network data including header information and high level user information. By analysing datasets a useful pieces of information are retrieved and perfomed some manipulations over it.
Collected datasets are often divided into a smaller parts, data subsets, to analyse later separately. It is done because of perfomance reasons: algorithms used for analysis such a data could have non-linear compute time and processing a large amount of data by such algorithms may increase time exponentially. Such technique of dividing large dataset into smaller parts called windowing. Depending on the requrements two types of windowing are used:
\begin{itemize}
\item Size based windowed datasets. Large set of the data divided into a subset of smaller subset with the same size.
\item Time windowed datasets. Data is constantly collected and in some time periods collected data is processed. In such a scheme the size of the data subset is unknown and based on the network speed.
\end{itemize}
Using technique of dividing captured data into a subsets of data may affect the results of the analysis performed on it. So windowing parameters should be taken into account during interpreting results of the analysis.
\subsection{Network data inspection techniques.}
Network data inspection is a form of packet filtering based on inspecting each packet header information, compute some calculations on them and produce results.
\textbf{Packet decoding (packet analyzing)}
Packet decoding is the simplest task that could be performed over the gathered data. During this operation all header parameters are analysed and displayed in a human readable way. Examples of such type of the applications are tcpdump and Wireshark.
The fields where packet decoding is usable includes network security (intrusion detection systems, bandwidth abuse, etc.), network management and network failures detection. Network forensics analysis also is very interested in such type of inspection.
The following list presents the different tasks which could be performed over the collected packets:
\begin{itemize}
\item Graphical representation of raw data.
\item Statistical information and pattern extraction.
\item Rule based (signature based) analysis, anomaly detection and policies.
\item Flow based analysis.
\end{itemize}
Network monitoring, network management and security, all are interested in a data presented in a graphical form. It could be a 2D or 3D scatter plots, time based graphs, histograms or different types of diagrams.
A major position in the network analysis has pattern retrival task and calculation of the statistical parameters. Different time distributions, statistical moments of the first and second order, mean values and probability distributions could be calculated over the gathered network data. Also statistics can be used for other kind of the purpose, like marketing. Gathering information about average number of connections, average inbound and outboud speeds, time distributions of connections to servers could give a good picture for some product on the market.
Rule based or signature based analysis is a type of traffic inspection where every packet is matched agains a certain rule or signature \cite{rule_based_analysis}. Rules or signatures define specific fields in the packet header or combine several field values in a certain packet. Values may also be defined as an intervals or thresholds. Threshold rules are used all over security, for example, to identify DoS or bandwith abuse attacks. In this case rules may be treated as policies, which allow or disallow certain amount of traffic in the network.
The synonym for the rule based analysis is a signature pattern matching and both two terms are used in a literature about network analysis. But the signature pattern matching is also used for the statistical part of the packets inspection. In this thesis rule based analysis is used to name the packet analysis for matching against rules as it is done in network intrusion detection systems and described later.
\section{Intrusion detection.}
Intrusion detection is a process of detection undesired traffic on a network or device \cite{nids_evolution}. An Intrusion Detection System is a combination of a physical device with a software which monitors network traffic in order to detect undesired activity and traffic that violates security policies. Different IDS tools are able also to store event information into log for later review or combine events with other information to make decisions about policies and losses. More functionalities provide IPS that can prevent or stop undesired traffic.
\textbf{Network-based Intrusion Detection (NID)}
A Network Intrusion Detection System (NIDS) is a type of IDS that monitors network traffic and analyzes packets deep on all levels of OSI model. Based on result of analyzis it makes decisions about the traffic type and suspicious activity. However, historically NIDS has been unable to operate in the following environment:
\begin{itemize}
\item Switched networks.
\item Encrypted networks.
\end{itemize}
\textbf{Host-based Intrusion Detection (HID)}
Host-based intrusion detection systems monitor and detect system and user activities on a given host. More complicated tools also offer policy management, data forensics statistical analysis and some access control. The difference between network-based intrusion detection and host-based intrusion detection is that NID deals with a traffic transmitted from one host to another while HID controls what is happening on the hosts themselfs.
Host-based intrusion detection is better fits to fight with internal threats, because it monitor and control specific user actions and file access. A lot of computer threats comes from the inside of organization, for example disgruntled employees or corporate spies.
\subsection*{Detection types}
\subsection*{Signature-Based Detection}
Intrusion Detection may use signature-based detection to analyze traffic. Using such type of detection means that system is relying on known traffic data to analyze running traffic. An attacker can modify a little an attack to make it undetectable for such type of IDS. But regular expression-based rules give more power against such slightly changed attacks in the same time remaining very accurate.
\subsection*{Anomaly-Based Detection}
This type of IDS looks at network traffic and detects data that is incorrect, not valid, or generally abnormal. This method is useful for detecting undesired traffic that is not specifically known. For instance, an anomaly-based IDS will detect that an Internet protocol (IP) packet is malformed. It does not detect that it is malformed in a specific way, but indicates that it is anomalous.
\subsection*{Stateful Protocol Inspection}
This type of detection is similar to anomaly-based detection, but it additionally inspects traffic on the transport layer and vendor-specific application layer, while anomaly-based detection is not doing it.
\section{Intrusion Detection Systems: A Brief History.}
Beginning in 1980, with James Anderson's paper, Computer Security Threat Monitoring and Surveillance, the notion of intrusion detection was born. Since then, several pivotal events in IDS technology have advanced intrusion detection to its current state \cite{nids_evolution}.
James Anderson's seminal paper, written for a government organization, introduced the notion that audit trails contained vital information that could be valuable in tracking misuse and understanding user behavior. With the release of this paper, the concept of "detecting" misuse and specific user events emerged. His insight into audit data and its importance led to tremendous improvements in the auditing subsystems of virtually every operating system. Anderson's conjecture also provided the foundation for future intrusion detection system design and development. His work was the start of host-based intrusion detection and IDS in general.
In 1983, SRI International, and specifically Dr. Dorothy Denning, began working on a government project that launched a new effort into intrusion detection development. Their goal was to analyze audit trails from government mainframe computers and create profiles of users based upon their activities. One year later, Dr. Denning helped to develop the first model for intrusion detection, the Intrusion Detection Expert System (IDES), which provided the foundation for the IDS technology development that was soon to follow.
In 1984, SRI also developed a means of tracking and analyzing audit data containing authentication information of users on ARPANET, the original Internet. Soon after, SRI completed a Navy SPAWAR contract with the realization of the first functional intrusion detection system, IDES. Using her research and development work at SRI, Dr. Denning published the decisive work, An Intrusion Detection Model, which revealed the necessary information for commercial intrusion detection system development. Her paper is the basis for most of the work in IDS that followed.
Meanwhile, there were other significant advances occurring at University of California Davis' Lawrence Livermore Laboratories. In 1988, the Haystack project at Lawrence Livermore Labs released another version of intrusion detection for the US Air Force. This project produced an IDS that analyzed audit data by comparing it with defined patterns. In a telephone interview with the author, Crosby Marks, a former Haystack Project team member and Haystack Labs employee said that, "searching through this large amount of data for one specific misuse was equivalent to looking for a needle in a haystack."
The subsequent iteration of this tool was called the Distributed Intrusion Detection System (DIDS). DIDS augmented the existing solution by tracking client machines as well as the servers it originally monitored. Finally in 1989, the developers from the Haystack project formed the commercial company, Haystack Labs, and released the last generation of the technology, Stalker. Crosby Marks says that "Stalker was a host-based, pattern matching system that included robust search capabilities to manually and automatically query the audit data." The Haystack advances, coupled with the work of SRI and Denning, greatly advanced the development of host-based intrusion detection technologies.
To kick off the '90s, UC Davis's Todd Heberlein introduced the idea of network intrusion detection. In 1990, Heberlein was the primary author and developer of Network Security Monitor (NSM), the first network intrusion detection system ( see Heberlein, L. et al. "A Network Security Monitor." Proceedings of the IEEE Computer Society Symposium, Research in Security and Privacy, May 1990, pp. 296-303.) NSM was deployed at major government installations where network traffic analysis provided massive amounts of information. This new awareness generated more interest in the field of intrusion detection and investments in that market increased significantly. Heberlein's contributions also extended to the DIDS project where, along with the Haystack team, he introduced the first idea of hybrid intrusion detection. The work of the Haystack project and the introduction of the Network Security Monitor revolutionized the IDS field and brought it into the commercial world.
Commercial development of intrusion detection technologies began in the early 1990s. Haystack Labs was the first commercial vendor of IDS tools, with its Stalker line of host-based products. SAIC was also developing a form of host-based intrusion detection, called Computer Misuse Detection System (CMDS). Simultaneously, the Air Force's Cryptologic Support Center developed the Automated Security Measurement System (ASIM) to monitor network traffic on the US Air Force's network. ASIM made considerable progress in overcoming scalability and portability issues that previously plagued NID products. Additionally, ASIM was the first solution to incorporate both a hardware and software solution to network intrusion detection. ASIM is still currently in use and managed by the Air Force's Computer Emergency Response Team (AFCERT) at locations all over the world. As often happened, the development group on the ASIM project formed a commercial company in 1994, the Wheel Group. Their product, NetRanger, was the first commercially viable network intrusion detection device. Nonetheless, commercial intrusion detection systems developed slowly during these years and only truly blossomed towards the latter half of the decade.
The intrusion detection market began to gain in popularity and truly generate revenues around 1997. In that year, the security market leader, ISS, developed a network intrusion detection system called RealSecure. A year later, Cisco recognized the importance of network intrusion detection and purchased the Wheel Group, attaining a security solution they could provide to their customers. Similarly, the first visible host-based intrusion detection company, Centrax Corporation, emerged as a result of a merger of the development staff from Haystack Labs and the departure of the CMDS team from SAIC. From there, the commercial IDS world expanded its market-base and a roller coaster ride of start-up companies, mergers, and acquisitions ensued.
Currently, market statistics show that IDS is amidst the top selling security vendor technologies and should continue to rise. Furthermore, government initiatives, such as the Federal Intrusion Detection Network, (FIDNet) created under Presidential Decision Directive 63, are also adding impetus to the evolution of IDS. Advancements in IDS will ultimately push security technology into a whole new arena of automated security intelligence.
\section{GPUs}
GPU is a graphical processor unit sometimes also called visual processing unit or VPU, is a specialized type of processor that is responsible to render 3D graphics and offload this procedure from the micriprocessor or CPU. GPU history starts in 70s, when CTIA and ANTIC chips were developed. These chips allowed mixed graphics and text mode on Atari computers be hardware controlled. The ANTIC chip was a special purpose processor, that was responsible for mapping graphics and text data to the video output.
In the 1984 appeared one of the first accelerators for the 2D and 3D graphics for the IBM PC architecture compatible systems: IBM's Professional Graphics Controller. Chip didn't succeed because of high cost and poor compatibility with already existing software. Commondore Amiga was the first mass-market computer which included a dedicated graphics processor launched in 1985. Amiga's graphic processor was the first full graphics accelerator that offloaded practically all video operations from the CPU.
The first graphics system that implemented 2D primirives in hardware was the IBM's 8514 PC video card. In 1991, S3 manufacturer introduced the S3 86C911 to the market, which claimed to be the first single-chip graphics card to implement 2D acceleration functions in hardware. Other manufacturers followed the 86C911 model, and in 1995 all graphics card vendors has added 2D hardware acceleration support to their chips. In the first half of 1990s real-time 3D graphics became very significant, because of developing CAD (Computer Aided Design) systems and especially due to video games. Video games became more and more popular and as a result of growing demand on 3D hardware acceleration graphics processor vendors started to develop 2D and 3D graphics accelerators. Launching in 1996 the Verite V1000 chip by Rendition a new level of 3D acceleration was reached. This chip was the first consumer 3D accelerator cards truly capable of delivering playable performance with significantly improved graphics quality.
In the second half of 1990s only several manufacturers appeared to compete over the GPU market and by the end of 1990s, manufacturers leaders were NVIDIA, ATI and 3dfx. In 1999 NVIDIA launched Geforce 256 which was the first card on the market supported hardware transormations and lighting capabilities.
In the early 2000s, the OpenGL API appeared. It is a multiplatform and multilanguage API that was created in 1992 by Silicon Graphics Inc. to help programmers draw 3D images. Due to the new API and new hardware architectures that allowed to process each pixel by a short program which is able to include additional textures as inputs and geometric vertex be processed in a similar way, 3D software got a huge capability improvement. NVIDIA's Geforce 3 was the first device that supported vertex shaders programming. In 2000 NVIDIA bought 3dfx and from that point to present, the market of high perfomance GPU is dominated by NVIDIA on one side, with an estimated market share of 59.4\% in June 2011 according to \cite{gpu_marketshare}, and ATI, with an estimated 40.6\% of market share.
The latest chips of NVIDIA are the GF119 and GF110 chip family (Geforce 500 and 600 series) generation. Recently NVIDIA has published a new architecture for the CUDA enabled chips with the code name Kepler, which has 1536 CUDA cores in the entire GPU. For his part ATI has developed Radeon 7000 family, with the Tahiti graphic chipsets.
\subsection{GPGPU: general-purpose computing on graphics processing units.}
GPGPU stands for General-Purpose Computing on Graphics Processing Units. Since 2003, several researchers like Harris, Mark J., William V. among others \cite{gpgpu}, outlined that current architecture of high performance GPUs in terms of FLOPS (FLoating-point Operations Per Second), with programmable fragment and vertex shaders that enabled the programmers to create more realistic and complex graphics, could be used for other purposes rather than graphic calculations.
The major motivation for the progress of GPGPU was improving perfomance of the algorithms which easily scales for a parallel computations and overcome limitations of traditionally CPU based computing: the instruction-level parallelism wall and the memory wall. On the one side GPUs have a small number of instructions to be performed on the data, however they are able to execute many of this instructions simultaneously over the large amount of data. This became available because of the programmable shaders that were added to the GPU processor's pipelines. GPUs can compute a lot of vertices or fragments of graphics in the same way. Such a computation called stream. The streams is an array of elements required the same calculation performed on it, and kernels are the functions applied to all elements in the stream.
\begin{figure}[h]
\centering
\includegraphics[scale=0.45]{images/cpuvsgpu.png}
\caption{GPU vs. CPU processor FLOPS performance comparison \cite{cuda_guide}.}
\label{fig:cpuvsgpu}
\end{figure}
Also, the usage of graphical processor units have another important advantage over traditionally computing CPU based model: the memory bandwidth. In the last decade the gap between CPU and memory speed have kept growing, and thus memory latency has become a major bottleneck in CPU computing, specially in applications with an intensive usage of memory. The evolution of theoretical single precision floating point operations (FLOPS) for both Intel based CPU processors and NVIDIA based GPU processors is shown in the figure \ref{fig:cpuvsgpu}.
First attempts of using GPUs with other purposes rather than graphics, required to transform or convert complex algorithms and data into a graphics, to be able to use the GPU through graphics libraries (like OpenGL) to solve them and later revert the transformation.
NVIDIA, conscious that GPGPU could be an important boost for the GPU market and also knowing that the current approaches for using GPUs for general purpose programming required a high level of knowledge and was a tedious job, started developing an SDK with the purpose of simplifying the task of GPGPU programming. The result of this development was CUDA (Compute Unified Device Architecture), that was launched in November 2006.
CUDA is a parallel computing architecture that enables programmers to use both CPU and GPU processors to cooperate in a single program, using a computing paradigm known as heterogeneous computing. Software developers are able to program general purpose functions or routines to be run on the GPU by simply use “C for CUDA” (C with NVIDIA extensions) while the rest of the program is still executed in the CPU.
CUDA has become widely used in many areas such as physics simulations, scientific and medical simulations, signal processing, cryptography or audio and video processing among others. ATI also launched his own GPGPU SDK called Stream SDK, but at the time Stream SDK has not been as successful as CUDA.
\subsection{CUDA architecture and programming model for GPGPU.}
CUDA is the parallel computing architecture developed by NVIDIA which allows developers to use 'C for CUDA' language to code parts or functions of a general purpose program. The main three key abstractions that are available to the programmer as the C extensions are: a hierarchy of thread groups, shared memories and thread barrier synchronization. CUDA programmers divide the complex task that is going to be accelerated using GPU into a smaller several tasks that can be solved independently in parallel. Functions executed by GPUs are called kernels. The rest of the code remains the same and is executed on the CPU.
\begin{figure}[h]
\centering
\includegraphics[scale=0.4]{images/cuda_threads.png}
\caption{CUDA thread hierarchy\cite{cuda_guide}.}
\label{fig:cuda_threads}
\end{figure}
Kernels are functions with \_global\_ attribute. When they are called, kernels throw a total number of N threads. To achieve a good performance in general, thousand of kernels should be executed simultaneously.
Figure \ref{fig:cuda_threads} shows kernel’s thread organization of model. All CUDA threads are organized into a 2D array of blocks called grid. Each of this block contain a logical 3D array of threads. However at this moment hardware support only 2D array. The dimensions of grid and blocks should be choosed before running a kernel, because they cannot be cnahged during execution of the CUDA code. If the programmer doesn't want to use multidimensional array of threads or blocks one dimension could be defined as 0. This way threads will be organized in a simple array of 1 dimension.
During kernel execution code could read current thread ID and current block ID. It is done by using built-in variables blockIdx.x, blockIdx.y for blocks and threadIdx.x, threadIdx.y and threadIdx.z for threads. Dimensions of the grid and block can also be retrieved using built-in variables like blockDim and gridDim. The combination of the blockIdx and threadIdx identify each thread, are used to perform coalesced memory access and execute code for a different input data based in the thread and block IDs. The current restriction for the maximum threads is 65536 with limit over the one dimension of 512 threads/blocks.
\lstset{language=C++, label=lst:cuda_kernel, captionpos=b, caption={Simplified CUDA kernel and associated main() function.}}
\begin{lstlisting}
__global__ void vecAdd (float *A , float *B , float *C)
{
int i = threadIdx.x;
C[i] = A[i] + B[i];
}
int main (int argc, char *argv[])
{
vecAdd<<<NUM_BLOCKS, NUM_THREADS>>>(A, B, C);
return 0;
}
\end{lstlisting}
The code contained in the listing \ref{lst:cuda_kernel} shows a simplified example of a kernel call, executing vecAdd kernel with a 1D grid organization and 1D thread block organization, throwing NUM\_BLOCKS blocks, with NUM\_THREADS threads per block.
\setsecnumdepth{all}
\chapter{Design}
\label{chap:design}
\section{Developing tools and methodology.}
The tools and programming languages selected for implementing application are:
Languages:
\begin{itemize}
\item \textbf{C++}\cite{bjarn_str}: due to perfomance requirements and supporting of object oriented features (classes, inheritance), C++ is used for developing the whole application, except of the CUDA device code part, where C dialect for CUDA is used.
\item \textbf{CUDA}\cite{cuda_zone}: CUDA uses C language with additional features to perform general purpose calculations in the GPUs.
\end{itemize}
Libraries:
\begin{itemize}
\item \textbf{Libpcap}\cite{cap_app}: is packet-capture library for retrieving packets from the Packet Filter.
\item \textbf{QT}\cite{qt_project}: is a cross-platform application framework for developing application with a rich graphical interface. It is used for building GUI part of the application.
\item \textbf{Boost}\cite{boost}: is a set of libraries that extend the functionality of the C++ programming language. Classes for processes synchronization are used (semaphores and mutexes) and also a Boost.Spirit parser to parse rules dataset.
\item \textbf{Sqlite}\cite{sqlite}: is a software library that implements a SQL database engine and is used to store results.
\end{itemize}
The methodology used has been the spiral model.
\begin{figure}[h]
\centering
\includegraphics[scale=0.7]{images/spiral-model.png}
\caption{Spiral methodology used in the developing process of the application.}
\end{figure}
In this methodology, the process of determining requirements, analyzing possible risks, implementing, testing and planning (results evaluation), are performed during the whole project. The number of rotating through the spiral depend on the found implementation issues, the precision of objective definition in the early stages of the development and the requirements accomplishment of the current implementation.
\section{NIDS rules.}
NIDS rules dataset has a simple text format. Each line of the text file contain rule in the following format:
\begin{figure}[h]
\centering
\begin{verbatim}
action protocol src_ip src_port direction dst_ip dst_port {options}
\end{verbatim}
\caption{Rule format.}
\label{fig:rule_format}
\end{figure}
\textbf{Action} could be an \textbf{alert} or \textbf{log}, where 'alert' means that if case rule is matched against some packet it should be alerted to user, for example highlighted with a red color in a GUI or appeared as a MessageBox dialog. If rule starts with a 'log' action it means that packet matched against rule will be logged into a database without any specific actions for gaining user attention.
\textbf{Protocol} is a packet's protocol type. Because of simplifications only 'tcp' is supported by developing application. Also 'tcp' will take into consideration only IP packets from the network layer of the OSI model.
\textbf{Src\_ip} is a source IP address of the packet. IP address is written in a standard form '192.168.0.1' where each position could be a range, for example '192.168.0-20.1'. Also instead of any position in the IP address '*' symbol can be used to accept all values on that position. For example '192.168.*.*'.
\textbf{Src\_port} is a source TCP port. It is a value from 0 to 65535, or range, for example 5-1234, or '*' symbol which means any port.
\textbf{Direction} is a string "\textendash\textgreater". At this moment only this direction string is supported.
\textbf{Dst\_ip} is a destination IP address written in the same format as source IP address.
\textbf{Dst\_port} is a destination TCP port written in the same format as a source TCP port.
\textbf{Options} are parameters that specifies payload analysis. Within options should be a \textbf{content} or \textbf{pcre} parameters. Content specifies a string which is searched within a packet payload. Also content string could contain raw bytes which should be written within '\textbar' symbols. Example of the content option with a string and raw bytes: \textbf{content:"stringtosearch\textbar00 01 02\textbar"}.\newline Pcre parameter allows to specify the searched string as a regular expression. Example of pcre option: \textbf{pcre:"\textasciicircum(some)*regex"}. The reason of dividing searched string on content and pcre is that all content strings are compiled into one DFA while each pcre is compiled in its own DFA and could consume much more memory. Therefore, if it's possible, it is preffered to use more content parameters against pcre parameters, because content parameters are using fast searched Aho-Corasick algorithm to search all contents simultaneously, while every pcre expression is searched individually for each packet. Options may contain parameters \textbf{offset} and \textbf{length} which means position in a payload from which searching should be started and number of bytes from the start position to observe. And finally \textbf{msg} parameter can appear within options. It is a simple string that should be logged or alerted to user when some packet matches against the rule.
\section{Application design overview.}
The main objective of this thesis is to design and implement simple NIDS application capable to analyze network traffic using GPUs using CUDA.
\subsection*{Pattern matching.}
Pattern matching is the most heavy operation that affects the NIDS perfomance. Pattern matching algorithms can be classified into single-pattern and multi-pattern algorithms.
Single-pattern algorithms work in a way that each pattern is searched in a given text individually. Therefore for k searched patterns algorithm should be repeated k times. Some of the widely used algorithms are Knuth-Morris-Pratt\cite{kmp} and Boyer-Moore\cite{boyermoore}. Knuth-Morris-Pratt is able to skip characters when a mismatch occurs in the comparison phase using a partial match table for each pattern. Each table is built by preprocessing every pattern separately. Boyer-Moore is the most widely used single-pattern algorithm. Its execution time can be sublinear if the suffix of the string to be searched for appears infrequently in the input stream, due to the skipping heuristics that it uses.
Multi-pattern matching algorithms search a set of patterns in a text simultaneously. It is done by preprocessing all patterns that could be searched into an automation which is used during searching phase. The automation may be represented as a trie, a table or combination of the two. Such algorithms searches each byte from the text only once. Also such algorithms scales much better that single-pattern. Example of multi-pattern algorithms: Aho-Corasick\cite{aho_corasick}, Wu-Manber and Commentz-Walter\cite{commwalter}.
Previous section introduced NIDS rule which could contain \textbf{content} or \textbf{pcre} patterns that should be searched within payload. For each regular expression a single DFA will be built and packet payload will be matched against certain DFA. To search content strings it was decided to use Aho-Corasick algorithm, this way all content string are searched simultaneously within the packet payload.
The Aho-Corasick algorithm consists of two parts. In the first part we construct from the set of patterns a finiste state machine; in the second part we apply the packet payload as input to the pattern matching machine. The machine signals whenever it has found a match for a keyword. The pattern matching machine consists of a set of states. Each state is represented by a number. The machine processes the text by successively reading symbols from input string, making state transitions and occasionally emitting output. The behaviour of the pattern matching machine is dictated by three functions: a goto function \emph{g}, a failure function \emph{f} and an output function \emph{output}. Figures \ref{fig:goto_function}, \ref{fig:failure_function} and \ref{fig:output_function} shows the functions used by a pattern matching machine for a set of keywords (he, she, his, hers).
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/goto_function.png}
\caption{Aho-Corasick goto function.}
\label{fig:goto_function}
\end{figure}
\begin{figure}[h]
\centering
\begin{tabular}{l l l l l l l l l l}
\emph{i} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
\emph{f(i)} & 0 & 0 & 0 & 1 & 2 & 0 & 3 & 0 & 3 \\
\end{tabular}
\caption{Aho-Corasick failure function.}
\label{fig:failure_function}
\end{figure}
\begin{figure}[h]
\centering
\begin{tabular}{l l}
\emph{i} & \emph{output(i)} \\
2 & {he} \\
5 & {she, he} \\
7 & {his} \\
9 & {hers} \\
\end{tabular}
\caption{Aho-Corasick output function.}
\label{fig:output_function}
\end{figure}
One state (usually 0) is designated as a start state. In figure \ref{fig:goto_function} the states are 0, 1,...,9. The goto function \emph{g} maps a pair consisting of state and an input symbol into a state or the message fail. The directed graph in figure \ref{fig:goto_function} represents the goto function. The absence of an arrow indicates \emph{fail}. Thus, g(1, \emph{s}) = \emph{fail} for all input symbols \emph{s} that are not 'e' or 'i'.
The failure function \emph{f} maps a state into a state. The failure function is consulted whenever the goto function reports \emph{fail}. Certain states are designated as output states which indicate that a set of keywords has been found.
\begin{figure}[h]
\textbf{Input.} A text string $x = a_1 a_2 ... a_n$ where each $a_i$ is an input symbol and a pattern matching machine M with goto function \emph{g}, failure function \emph{f}, and output function \emph{output}.\\
\textbf{Output.} Locations at which keywords occur in x.\\
\textbf{Method.}
\begin{algorithmic}
\FOR{$i=1$ to $n$}
\WHILE{$g(state, a_i) = fail$}
\STATE $state \leftarrow f(state)$
\ENDWHILE
\STATE $state \leftarrow g(state, a_i)$
\IF{$output(state) \neq empty$}
\PRINT $i$
\PRINT $output(state)$
\ENDIF
\ENDFOR
\end{algorithmic}
\caption{Aho-Corasick pattern matching machine algorithm.}
\label{fig:ac_search_pseudocode}
\end{figure}
\begin{figure}[h]
\textbf{Input.} Set of keywords K={$y_1$, $y_2$,..., $y_k$} \\
\textbf{Output.} Goto function \emph{g} and a partially computed output function \emph{output}.\\
\textbf{Method.} It is assumed that \emph{output(s)} is when empty state s is first created, and \emph{g{s, a} = fail} if \emph{a} is undefined or if \emph{g(s, a)} has not yet been defined. The procedure \emph{enter(y)} inserts into goto graph a path that spells out \emph{y}.
\begin{algorithmic}
\STATE $newstate \leftarrow$0
\FOR{$i=1$ to $k$}
\STATE $state \leftarrow$0
\STATE $j \leftarrow$1
\WHILE{$g(state, a_j) \neq fail$}
\STATE $state \leftarrow g(state, a_j)$
\STATE $j \leftarrow j+1$
\ENDWHILE
\FOR{$p \leftarrow j$ to $m$}
\STATE $newstate \leftarrow newstate+1$
\STATE $g(state, a_p) \leftarrow newstate$
\STATE $state \leftarrow newstate$
\ENDFOR
\STATE $output(state) \leftarrow {a_1a_2 ... a_m}$
\ENDFOR
\FORALL{$a$ such that $g(0, a)=fail$}
\STATE $g(0, a) \leftarrow$0
\ENDFOR
\end{algorithmic}
\caption{Aho-Corasick goto function construction algorithm.}
\label{fig:ac_goto_pseudocode}
\end{figure}
\begin{figure}[h]
\textbf{Input.} Goto function \emph{g} and failure function \emph{f}. \\
\textbf{Output.} Next move function \emph{d}.\\
\begin{algorithmic}
\STATE $queue \leftarrow empty$
\FOR{each symbol $a$}
\STATE $d(0, a) \leftarrow g(0, a)$
\IF{$g(0, a) \neq 0$}
\STATE $queue \leftarrow queue \cup g(0, a)$
\ENDIF
\ENDFOR
\WHILE{$queue \neq empty$}
\STATE let $r$ be the next state in $queue$
\STATE $queue \leftarrow queue - {r}$
\FOR{each symbol $a$}
\IF{$g(r, a) = s \neq fail$}
\STATE $queue \leftarrow queue \cup {s}$
\STATE $d{r, a} \leftarrow s$
\ELSE
\STATE $d(r, a) \leftarrow d(f(r), a)$
\ENDIF
\ENDFOR
\ENDWHILE
\end{algorithmic}
\caption{Aho-Corasick DFA construction algorithm.}
\label{fig:ac_dfa_pseudocode}
\end{figure}
An operating cycle of a pattern matching machine is defined as follows. Let s be the current state of the machine and a the current symbol of the input string x.
\begin{enumerate}
\item If \emph{g(s, a) = s'}, the machine makes a goto transition. It enters state s' and the next symbol of x becomes the current input symbol.
In addition, if \emph{output(s') empty}, then the machine emits the set \emph{output(s')} along with the position of the current input symbol. The operating cycle is now complete.
\item If \emph{g(s, a) = fail}, the machine consults the failure function \emph{f} and is said to make a \emph{failure} transition. If \emph{f(s)=s'} the machine repeats the cycle with s' as the current state and a as the current input symbol.
\end{enumerate}
Initially, the current state of the machine is the start and the first symbol of the text string is the current input symbol.
Figure \ref{fig:ac_search_pseudocode} shows Aho-Corasick matching machine algorithm pseudocode. Pseudocode for \textbf{Goto} function construction is shown in the figure \ref{fig:ac_goto_pseudocode}.
To be able to use Aho-Corasick algorithm with a GPU a DFA should be constructed from the previously constructed \textbf{Goto}, \textbf{Fail} and \textbf{Output} functions. The algorithm for constructing DFA is shown in a figure \ref{fig:ac_dfa_pseudocode}. In practice, algorithm shown in figure \ref{fig:ac_dfa_pseudocode} should be evaluated in conjunction with algorithm for constructing failure function. The resulting DFA is a two dimensional array. The dimensions of the array are equal to the number of states and the size of the alphabet (256 in our case) plus one position to store flag if state is \emph{output}.
The algorithm for searching through DFA is similar to the regular expression's one and described in the next section.
\subsection*{Regular expression pattern matching.}
Regular expressions are compiled into a DFA by using external library. As a result a two dimensional array is obtained. Array structure is similar to Aho-Corasick DFA. The dimensions of the array are equal to the number of states and the size of the alphabet (256 in our case) plus one position to store index of the rule if current state signs that regular expression is matched. Therefore the final algorithm for searching pattern within packet payload is shown in figure \ref{fig:dfa_search}.
\begin{figure}[h]
\textbf{Input.} A text string $x = a_1 a_2 ... a_n$ where each $a_i$ is an input symbol and a next move function \emph{d} \\
\textbf{Output.} Locations at which keywords occur in x.\\
\begin{algorithmic}
\STATE $curstate \leftarrow$0
\FOR{$i=1$ to $n$}
\STATE $state \leftarrow d(state, a_i)$
\IF{$d(state, 256) \neq $0}
\PRINT $i$
\PRINT $output(i)$
\ENDIF
\ENDFOR
\end{algorithmic}
\caption{Pattern matching DFA search.}
\label{fig:dfa_search}
\end{figure}
DFA search algorithm just traverse over DFA starting from zero state and for every input symbol it calculates the new state. Current state corresponds to the row in the DFA two-dimensional array. And current input symbol corresponds to the offset in the current row. This way new state is read from the DFA. For every state it is also checked if it has output values. Output value means that match to some string occured.
\subsection*{Packet filtering based on route information.}
Information obtained from packet header is used to select packet for analysing its payload or drop it without any analysis. Information from loaded rules should match the packet's header information to continue processing of received packet. It was decided to store route information from the loaded rule as a sequence of bytes in a trie structure. Figure \ref{fig:trie_example} shows example of a trie structure with next IP addresses stored: 192.168.0.1, 192.168.1.38, 192.168.5-100.1 and 232.222.1.79.
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/trie.png}
\caption{Example of trie structure.}
\label{fig:trie_example}
\end{figure}
The full sequence stored in trie is shown in a figure \ref{fig:route_sequence}. Therefore to compare any received packet to some rule, route information from packet's header should be represented as a sequence of bytes shown in a figure \ref{fig:route_sequence}, and search procedure on a trie for the sequence of bytes is perfomed. This approach is similar to Aho-Corasick algorithm of searching with data saved in trie. The only difference is that trie may contain IP range (trie node may be not just a simple node with value, but contain range of values). If searching in trie found any rule for this packet, this packet is put in a queue for a payload analysis.
\begin{figure}[h]
\centering
\includegraphics[scale=0.6]{images/route_sequence.png}
\caption{Full route sequence extracted from the packet header.}
\label{fig:route_sequence}
\end{figure}
\subsection*{Transfering packets to the GPU.}
The simplest approach is to transfer each packet to the GPU separately. However, there is an overhead associated with a data transfer operation to the GPU and many small transfers to the GPU result into a perfomance decreasing. Therefore the following approach is designed to achieve better perfomance for packets transfering. Packets obtained by CPU are collected in a buffer for some time. When CPU decides that it's time to process packets it transfers a batch of packets to the GPU for processing.
CPU program decision on transferring packets to the GPU could be based on two policies:
\begin{itemize}
\item Decision based on the time. Buffer with packets is transfered to the GPU on timer event.
\item Decision based on buffer size. Buffer with packets is transfered to the GPU when size of the buffer exceeds some size.
\end{itemize}
\subsection*{Database structure.}
\begin{figure}[h]
\centering
\includegraphics[scale=0.55]{images/database.png}
\caption{Database structure.}
\label{fig:database_structure}
\end{figure}
Database with results is very simple and contain only one table which have all information about matched packet inside. Figure \ref{fig:database_structure} shows table fields that are filled for a matched packets.
\begin{itemize}
\item \textbf{Time} field contains timestamp when NIDS detected suspicious packet.
\item \textbf{Action} field contains action from the matched rule.
\item \textbf{Message} field contains message from the matched rule.
\item \textbf{Rule} field contains the matched rule itself.
\end{itemize}
In the current application version it was decided not to save payload from the matched packet because of perfomance reasons.
\subsection*{Application components.}
Application design is divided into a several components. The diagram contained in a figure \ref{fig:app_architecture} shows the main application components which should be implemented and used.
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/app_architecture.png}
\caption{The main application components.}
\label{fig:app_architecture}
\end{figure}
\begin{itemize}
\item \textbf{Rules dataset parser}. This component is used to parse input file with Nids rules written in a text format. It is better for parser to be easy extendable for easy adding new options to the rule.
\item \textbf{Database}. This component allows in an easy way adding results of analysing packets into a database.
\item \textbf{GUI}. This component implements user interactions with a program. Application has simple window-based interface which allows to perform all major manipulations with the application like starting capturing traffic, stop capturing, switching analyzing mode, saving traffic into the file, etc.
\item \textbf{Regular expressions implementation}. This module is used to compile regular expression obtained from rules. Regular expressions are compiled to DFA for loading into the CUDA device later.
\item \textbf{Aho-Corasick implementation}. This module is used to create DFA for all simple content strings that are read from rules. Application uses Aho-Corasick algorithm for fast string searching within network packets.
\item \textbf{Network packets capture}. Application itself doesn't capture packet from the kernel, because it is not a trivial task. Instead, program uses external library for packet capturing.
\item \textbf{CUDA module}. Code that runs on a GPU and performs analysis of packets against rules.
\item \textbf{NIDS}. The main component of the system. Uses all others modules to read rules, start capturing and analysing of packets. Stores results of analyzing into a database and updates counters which are used within NIDS to do perfomance calculations.
\end{itemize}
\chapter{Implementation and evaluation.}
\section{General considerations.}
Application implementation has been developed using the following version of the programming tools and libraries:
\begin{itemize}
\item GCC 4.7.0.
\item Cuda compilation tools, version 4.2
\item Libpcap 1.2.1
\item Boost 1.49.0
\item QT 4.8.0
\item SQLite 3.7.11
\item RE2 (without version, source codes snapshot revision '2d252384c5e8' was used)
\end{itemize}
\lstset{language=Python, caption={Part of QT project file with extra compiler configuration.}, label=lst:extra_compiler_config, captionpos=b}
\begin{lstlisting}
# CUDA settings (for Arch Linux)
CUDA_SOURCES += my_nids_cuda.cu
# Path to cuda toolkit install
CUDA_DIR = /opt/cuda-toolkit
# GPU architecture
CUDA_ARCH = sm_11
# nvcc flags (ptxas option verbose is always useful)
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v
# include path
INCLUDEPATH += $$CUDA_DIR/include
# lib dir
QMAKE_LIBDIR += $$CUDA_DIR/lib64
# libs
LIBS += -lcudart
# join the includes in a line
CUDA_INC = $$join(INCLUDEPATH,' -I', '-I', ' ')
# Prepare the extra compiler configuration (taken from the nvidia forum)
cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependcy_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda
\end{lstlisting}
The NIDS application has been developed based on the design presented in the chapter \ref{chap:design}. The application uses workflow presented in \ref{chap:introduction}. Project sources are organized into QT project file. To compile project first \textbf{qmake} utility from QT is used to generate Makefile. Makefile is required by \textbf{make} utility to compile and link sources into final executable. The whole project structure is placed in one folder, but logically source codes are divided into "Headers", "Sources", "Forms" and "Other files" if IDE such as QT creator is used to manage the project. All source codes are compiled with \textbf{gcc} compiler except CUDA code. Files with CUDA source codes have \textbf{*.cu} extension and are compiled with \textbf{nvcc} compiler. To add possibility to compile \textbf{*.cu} files to QT project an extra compiler configuration is created in QT project file (shown in listing \ref{lst:extra_compiler_config}).
\section{Application threading model.}
The NDIS application has been developed using Boost.Thread library for Nids itself and QThread for GUI part. Boost.Thread is used in the main Nids class for processing packets on CPU or GPU. The QThread is used for executing Main window message processing and running Nids packet capture task from GUI event callback. The reason why two different threads classes were used is that developing of the core Nids application was started using only Boost library and QT part is added a little bit later.
When NIDS application process in CPU mode N threads are created for packet processing and 1 thread is created for processing analysis results. Packet capture thread writes packet payload into a buffer and puts pointer to the buffer into a queue. Other threads are sleep until some packet arrive into the queue. In the moment when packet arrived into the queue only one of the threads awakes and pulls packet from the queue for processing. After finishing working on packet it goes to sleep state again until new packet arrive into the queue.
For NIDS working in CPU+GPU mode packets processing is slightly different. There is only 1 thread which puts \textbf{tasks} (packet payload, payload length and number of DFA to search) into the queue and 1 thread which process packets when size of the packets in the queue exceeds some limit or time for processing elapsed. A double-buffered mechanism of processing packets were implemented to speed up the process of collecting packets: before processing packets on the GPU, buffers are switched and CPU is able to collect arriving packets into the second buffer.
To keep multithreaded code integrity, synchronization mechanisms from Boost library were used: Boost.Semaphore to notify threads that new packet arrived to the queue and Boost.Mutex for synchronizing access to queue and other shared objects.
\begin{figure}[h]
\centering
\includegraphics[scale=0.3]{images/cpu_threads.png}
\caption{Threading model for CPU processing.}
\label{fig:cpu_threads}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[scale=0.3]{images/gpu_threads.png}
\caption{Threading model for CPU+GPU processing.}
\label{fig:gpu_threads}
\end{figure}
Figures \ref{fig:cpu_threads} and \ref{fig:gpu_threads} shows the threading model for CPU and CPU+GPU modes graphically.
\section{Main application classes.}
In this section the main application classes are described, their functionality and purpose.
\subsection{Packet class.}
Packet class is used to store all information about network packet needed to analyze packet header and payload. It is used by Nids class to search within trie structure and analyze payload for regex or ac patterns. Listing \ref{lst:packet_class} shows part of the implementation of the class header.
\lstset{language=C++, caption={Extract of Packet.h}, label=lst:packet_class, captionpos=b}
\begin{lstlisting}
class Packet
{
public:
Packet(in_addr ip_from, in_addr ip_to, int port_from, int port_to);
// create packet from raw data
static Packet* create_packet(const u_char *raw_packet);
// for test purposes
static Packet* create_fake_packet();
// getters and setters for packet components
in_addr get_ip_from() const;
// ...
sequence32 &get_route();
const sequence32 &get_route() const;
int get_payload_offset() const;
int get_payload_size() const;
const u_char *get_raw_packet() const;
// payload from raw_packet is saved into internal vector
// it means that packet will be analyzed by CPU
vector<u_char> &save_payload();
const vector<u_char> &get_payload() const;
private:
//...
int payload_size;
int payload_offset;
// info about route for analyzing in the trie
sequence32 route;
const u_char *raw_packet;
vector<u_char> payload;
};
\end{lstlisting}
When new packet is arrived a create\_packet static method is used to analyze packet. This method is a packet analyzer and it used for every received packet to create Packet class with properly filled fields. Method create\_packet analyzes raw packet data and makes protocol identification. If received packet is TCP/IP packet it return newly created Packet object, otherwise it returns null object which means that it's not need to analyze received packet.
Right after returning from create\_packet method raw\_packet field contains the pointer to the raw packet payload. Method create\_packet doesn't copy payload data into any other place because of efficienty. This moment is described later in Nids class description.
Field route is used to save packet route information in the format described in figure \ref{fig:route_sequence}. The route information is stored in sequence32 type array which is defined as a std::vector class that contain 32 bit values. There is also sequence8 type in the application which is std::vector itself containing 8 bit values.
\subsection{AC class.}
AC class is the implementation of the Aho-Corasick algorithm. The main methods are \textbf{insert} and \textbf{search}. Method \textbf{insert} inserts some string into the internal trie while \textbf{search} tries to find given string within saved strings. When all strings are inserted \textbf{finish} method should be called to translate all strings into a DFA which later can be extracted for loading to the GPU.
\lstset{language=C++, caption={Extract of AC.h}, label=lst:ac_class, captionpos=b}
\begin{lstlisting}
#define FAIL -1
#define STATE_SIZE UCHAR_MAX + 1
class AC
{
public:
AC();
void insert(const sequence8 &sequence, int value);
void search(const sequence8 &sequence, vector<int> &result);
void search(const u_char *sequence, int length, vector<int> &result);
int next(int state, sequence_val8 value);
void finish();
int count_states();
int get_transition_value(int state, int c);
const vector<int> &get_transition_row(int state);
const vector<int> &get_out(int state);
private:
void enter(const sequence8 &sequence, int value);
void compute_fail_transitions();
vector< vector<int> > transition_table;
vector< vector<int> > out;
};
\end{lstlisting}
Internally DFA is stored in a \textbf{transition\_table} field with unbounded maximum number of states and number of transitions for the state equals to UCHAR\_MAX + 1 (256). Field \textbf{out} is an out function as described in Aho-Corasick algorithm description. For every state it contains all matched strings.
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/ac_grid.png}
\caption{Aho-Corasick DFA traversing.}
\label{fig:ac_grid}
\end{figure}
The search algorithm is simple traversing internal DFA for every next input symbol is shown on figure \ref{fig:ac_grid}.
\subsection{Regex class.}
Regex class implements compilation of regex patterns into DFA and searching in the input strings for previously compiled patterns. The main methods are \textbf{compile} and \textbf{search}.
\lstset{language=C++, label=lst:regex_class, captionpos=b, caption={Extract of Regex.h}}
\begin{lstlisting}
#define STATE_SIZE (UCHAR_MAX + 1)
#define STATE_DEAD -1
#define STATE_WHOLE_STR -2
class Regex
{
public:
Regex();
int compile(const sequence8 ®ex);
int count_states();
int get_transition_value(int dfa_offset, int state, int c);
const vector<int> get_transition_state(int dfa_offset, int state);
const vector<int> get_transition_state(int row);
bool is_match(int dfa_offset, int state);
bool is_match(int row);
const vector<int> get_dfa_offsets() const;
bool search(const sequence8 &data, int dfa_offset);
bool search(const u_char *data, int length, int dfa_offset);
private:
vector< vector<int> > transition_table;
vector<int> dfa_offsets;
int states_total;
int states_cur_dfa;
map<void *, int> states_map;
const re2::uint8 *bytemap;
int analyze_state(void *state);
};
\end{lstlisting}
Method \textbf{compile} uses RE2 regular expression library to compile input pattern into DFA. Default memory budget for compiling one DFA is 8388608 bytes. If compiling pattern exceed memory budget such regex is declined and won't be searched during packet analyzing.
\begin{figure}[h]
\centering
\includegraphics[scale=0.5]{images/regex_grid.png}
\caption{Regex DFA internal representation.}
\label{fig:regex_grid}
\end{figure}
The internal representation of DFA is similar to AC's one. The only difference is that AC has one DFA and all input string falls into it, while for every regular expression separate DFA is created. Thus, Regex class stores all DFAs into one vector and to be able to find specific DFA start offsets are stored into other vector. Other difference is that regex DFA contains states defined as \textbf{STATE\_DEAD} and \textbf{STATE\_WHOLE\_STR}. This states finish the whole searching procedure: value \textbf{STATE\_DEAD} means that for any input character the next state is always the current one, value \textbf{STATE\_WHOLE\_STR} means that no matter what the next input symbols could be, current searching procedure in any case is not able to finish successfully (e.g. searched pattern should match from the start of the string, if start is not matched, not matter what are the other symbols, searching will not succeed).
Regex DFAs representation stored in Regex class is shown in figure \ref{fig:regex_grid}.
\subsection{Trie class.}
Trie class is used to store the sequence of symbols and quick search over the input string. Internal structure of the trie was described in \ref{chap:design}. Trie class itself contain only three methods: \textbf{insert}, \textbf{search} and \textbf{clear}. The name of the methods are pretty clear and speaks for themselfs.
\lstset{language=C++, label=lst:trie_node_class, captionpos=b, caption={Extract of Trie_node.h}}
\begin{lstlisting}
typedef vector<trie_range> trie_range_t;
typedef vector<Trie_node *> trie_range_value_t;
typedef map<sequence_val32, Trie_node *> trie_transition_t;
class Trie_node
{
public:
Trie_node();
vector<int>& get_values();
void set_values(vector<int> &values);
void append_values(vector<int> &values);
void append_value(int value);
Trie_node *next(sequence_val32 val);
void set_next(sequence_val32 val, Trie_node *next);
void set_next_any(Trie_node *next);
Trie_node *get_next_any();
trie_transition_t &get_transitions();
trie_range_t &get_ranges();
trie_range_value_t &get_range_values();
private:
trie_transition_t next_;
trie_range_t range_next_;
trie_range_value_t range_next_value_;
Trie_node *any_next_;
vector<int> values;
};
\end{lstlisting}
More complex is the Trie\_node class which stores information for one trie node (listing \ref{lst:trie_node_class}). The type trie\_transition\_t maps input symbol value to the next trie node. Thus the search algorithm is similar to the searching over DFA, but trie node additionally contains range\_next\_ field which is used to store range for IP or port values. If, for example, 192.0-4.*.* is used, than range\_next\_ will contain 0-4 part of the IP address in current trie node. Also any\_next\_ is the pointer to the next trie node for * input symbol. Therefore searching over such kind of the trie is getting next trie node from the next\_ field for each input symbol from the input string. Additionally check should be done if range or any symbol exist in the current node.
\subsection{Rule class.}
For each rule in the input \textbf{rules.txt} file one object of Rule class type is created. Rule class stores a text representation of the rule and also all parameters of the rule which are parsed by using Boost.Spirit parser.
\lstset{language=C++, label=lst:rule_class, captionpos=b, caption={Extract of Rule.h}}
\begin{lstlisting}
class Rule
{
public:
static Rule *create_rule(const string &rule);
Action get_action();
const sequence32 &get_route();
// ...
private:
Rule();
// ...
// rule as string
string rule_str;
// action that should be taken if rule is matched
Action action;
// message that should be inserted into database if this rule matched
string message;
// route info includes all information about protocol and IP adresses and ports.
// it will be inserted in trie and matched against route info of every received packet.
sequence32 route;
vector<sequence8> content;
sequence8 pcre;
int offset;
int length;
};
\end{lstlisting}
The Rule object is created using static \textbf{create\_rule} method. This method simply parses passed string value and returns pointer to newly created Rule object with filled parameters. Parsing is done by using Boost.Spirit library which allows to write grammar for parsing directly in the C++ code. For parsing rules a \textbf{RuleRouteGrammar} was created and used in a Boost's \textbf{phrase\_parse} function. In case of malformed input rule \textbf{create\_rule} function returns null pointer. Listing \ref{lst:ip_grammar} contains an example of a simple grammar for parsing IP address. This grammar returns std::vector object with four ip parts represented as unsigned integers. The whole grammar for parsing rules is more complex. Parsing is done into a \textbf{RuleSpirit} structure and later data from this structure is copied into the Rule class.
\lstset{language=C++, label=lst:ip_grammar, captionpos=b, caption={Boost.Spirit grammar for parsing IP address.}}
\begin{lstlisting}
template<typename Iterator>
struct IPGrammar : qi::grammar<Iterator, std::vector<unsigned>(), space_type>
{
IPGrammar() : IPGrammar::base_type(ip_address)
{
ip_address = uint_parser<unsigned, 10, 1, 3>() >> '.' >>
uint_parser<unsigned, 10, 1, 3>() >> '.' >>
uint_parser<unsigned, 10, 1, 3>() >> '.' >>
uint_parser<unsigned, 10, 1, 3>();
}
qi::rule<Iterator, std::vector<unsigned>()> ip_address;
}
\end{lstlisting}
Grammar could be easily extended to include new parameters by adding qi::rule as a field and include new rules definitions into IPGrammar. In the same way the RuleRouteGrammar grammar could be extended if new options should be added into the Nids rule. Boost.Spirit allows to parse strings directly into C++ structures, so it is possible to create structures layout for parsing very complex expressions.
\subsection{Nids class.}
Nids class is the main class in the application. It manages all activities in the application, starts and stops network monitoring, all previously described classes are used by Nids class as helpers.
\lstset{language=C++, label=lst:nids_class, captionpos=b, caption={Extract of Nids.h}}
\begin{lstlisting}
class Nids
{
public:
Nids();
~Nids();
int read_rules(const char *filename);
bool start_monitor(const char *interface, int threads_num, const char *db_filename = NULL);
bool start_monitor_gpu(const char *interface,
int device_num,
WINDOW_TYPE window_type = BUF_SIZE,
int buf_threshold_size = DEF_PACKET_BUFFER_SIZE,
int buf_flush_time = DEF_TIME_FLUSH_BUFFER,
const char *db_filename = NULL);
bool start_monitor_offline(const char *interface,
const char *cap_filename,
int threads_num,
const char *db_filename = NULL);
bool start_monitor_offline_gpu(const char *interface,
const char *cap_filename,