Dataset accompanying:
Five Blind Men and the Internet: Towards an Understanding of Internet Traffic
Ege Cem Kirci, Ayush Mishra, Laurent Vanbever — NINeS 2026
data/profiles/: traffic profiles in Parquet format, named<index>_traffic_profile.parquetdata/metadata.csv: mapping table for each profile
Each profile file contains the following columns:
src_id(int): source/profile ID. This matches the<index>in the filename<index>_traffic_profile.parquet.human_time(string, format%Y-%m-%d %H:%M:%S): timestamp at 5-minute resolution (UTC-based canonical timeline).data_in(int): inbound traffic rate in bits per second (bps), using decimal/SI scaling.Gbps = data_in / 1e9Tbps = data_in / 1e12
Each Parquet profile file is keyed by <index> in its filename.
Example:
data/profiles/170_traffic_profile.parquetcorresponds toindex = 170indata/metadata.csv
metadata.csv columns:
index: profile identifier used in file namesixp_id: IXP identifier from PeeringDBsiblings: sibling IXP IDs (serialized list)
- If
siblingsis empty ([]), the profile maps one-to-one to a single IXP:ixp_id. - If
siblingsis non-empty, the profile is organization-level aggregated traffic across multiple IXPs. - In that aggregated case, the full IXP set is:
union({ixp_id}, siblings)
This repository intentionally keeps metadata minimal.
To obtain IXP names, locations, and other attributes, join ixp_id (and any IDs in siblings) against your PeeringDB data source.
Questions: ekirci@ethz.ch