This project is a Python application designed to analyze flow logs and classify them based on destination ports and protocols. It utilizes a lookup table to assign tags to specific port and protocol combinations, providing insights into network traffic patterns.
-
Protocol Mapping:
Supports multiple protocols (TCP, UDP, ICMP, etc.) with a predefined mapping. -
Batch Processing:
Efficiently processes large log files in batches to optimize memory usage. -
Error Logging:
Logs errors and warnings to a dedicated file for easier troubleshooting.
The application generates detailed output files, including:
- Tag counts
- Port/protocol counts
- List of untagged flows
- Python: Version 3.0 or higher
- Required Libraries:
csv
(import csv)logging
(import logging)time
(import time)collections
(from collections import defaultdict)
-
lookup_table.csv:
A CSV file containing destination ports, associated protocols, and corresponding tags.
Format:dstport,protocol,tag 25,tcp,sv_P1 68,udp,sv_P2
-
flow_logs.txt:
A text file containing flow log entries to be processed.
Format:<type> <account_id> <eni_id> <source_ip> <destination_ip> <source_port> <destination_port> <protocol> <action_count> <bytes> <start_time> <end_time> <status> <response>
(Fields are separated by spaces.)
After processing the logs, the following output files are generated:
-
output.csv:
A summary of the flow log processing, including tag counts, port/protocol counts, and untagged flows. -
tag_counts.csv:
A CSV file detailing the counts of each tag found in the flow logs. -
port_protocol_counts.csv:
A CSV file showing the counts of each unique port and protocol combination. -
untagged_flows.csv:
A CSV file listing the destination ports and protocols for flows that could not be tagged.
- Ensure that your
lookup_table.csv
andflow_logs.txt
files are in the same directory as the script. - Run the script from the command line:
python project.py
- After the script finishes execution, check the output files for results.
Errors encountered during processing are logged in the error.log
file, which contains details about any malformed lines or unrecognized protocol numbers.
The tool measures the processing time and prints it to the console, allowing users to gauge the efficiency of the log analysis.