Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docs): Add doc for Hubble control plane and metrics #1194

Merged
merged 9 commits into from
Jan 10, 2025
12 changes: 12 additions & 0 deletions docs/03-Metrics/01-metrics-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Metrics

Prometheus metrics available depend on the Retina control plane deployed.

## Control Planes

There are two control planes used in the Retina project: Hubble and the legacy control plane. Both control planes create metrics and traces which are generated by the eBPF data plane, which has a single implementation. Only one control plane should be deployed at a given time. Helm charts for the deployment are found under `deploy/hubble/manifests/controller/helm/retina` and `deploy/legacy/manifests/controller/helm/retina`.

1. [Hubble metrics](./hubble_metrics.md)
2. [Legacy metrics](./modes/modes.md)

> Note: Hubble offers additional features and metrics that the legacy control plane does not support. The plan is to deprecate the legacy control plane in favor of Hubble. For further documentation on Hubble, check [Cilium/Hubble repository](https://github.com/cilium/hubble/?tab=readme-ov-file#features) and official [Hubble metrics documentation](https://docs.cilium.io/en/stable/observability/metrics/#hubble-metrics)
53 changes: 53 additions & 0 deletions docs/03-Metrics/02-hubble_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Hubble Metrics

When Retina is deployed with Hubble control plane, the metrics include Node-level and Pod-level. Metrics are stored in Prometheus format, and can be viewed in Grafana.

## Metrics

* Node-Level Metrics: These metrics provide insights into traffic volume, dropped packets, number of connections, etc. by node.
* Hubble Metrics (DNS and Pod-Level Metrics): These metrics include source and destination pod information allowing to pinpoint network-related issues at a granular level. Metrics cover traffic volume, dropped packets, TCP resets, L4/L7 packet flows, etc. DNS metrics include DNS errors and DNS requests missing responses.

### Node-Level Metrics

The following metrics are aggregated per node. All metrics include labels:

* `cluster`
* `instance` (Node name)

Retina provides metrics for both Linux and Windows operating systems.
The table below outlines the different metrics generated.

| Metric Name | Description | Extra Labels | Linux | Windows |
|------------------------------------------------|-------------|--------------|-------|---------|
| **networkobservability_forward_count** | Total forwarded packet count | `direction` | ✅ | ✅ |
| **networkobservability_forward_bytes** | Total forwarded byte count | `direction` | ✅ | ✅ |
| **networkobservability_drop_count** | Total dropped packet count | `direction`, `reason` | ✅ | ✅ |
| **networkobservability_drop_bytes** | Total dropped byte count | `direction`, `reason` | ✅ | ✅ |
| **networkobservability_tcp_state** | TCP currently active socket count by TCP state. | `state` | ✅ | ✅ |
| **networkobservability_tcp_connection_remote** | TCP currently active socket count by remote IP/port. | `address` (IP), `port` | ✅ | ❌ |
| **networkobservability_tcp_connection_stats** | TCP connection statistics. (ex: Delayed ACKs, TCPKeepAlive, TCPSackFailures) | `statistic` | ✅ | ✅ |
| **networkobservability_tcp_flag_counters** | TCP packets count by flag. | `flag` | ❌ | ✅ |
| **networkobservability_ip_connection_stats** | IP connection statistics. | `statistic` | ✅ | ❌ |
| **networkobservability_udp_connection_stats** | UDP connection statistics. | `statistic` | ✅ | ❌ |
| **networkobservability_udp_active_sockets** | UDP currently active socket count | | ✅ | ❌ |
| **networkobservability_interface_stats** | Interface statistics. | InterfaceName, `statistic` | ✅ | ✅ |

### Pod-Level Metrics (Hubble Metrics)

The following metrics are aggregated per pod (node information is preserved). All metrics include labels:

* `cluster`
* `instance` (Node name)
* `source`
* `destination`

For *outgoing traffic*, there will be a `source` label with source pod namespace/name.
For *incoming traffic*, there will be a `destination` label with destination pod namespace/name.

| Metric Name | Description | Extra Labels | Linux | Windows |
|----------------------------------|------------------------------|-----------------------|-------|---------|
| **hubble_dns_queries_total** | Total DNS requests by query | `source` or `destination`, `query`, `qtypes` (query type) | ✅ | ❌ |
| **hubble_dns_responses_total** | Total DNS responses by query/response | `source` or `destination`, `query`, `qtypes` (query type), `rcode` (return code), `ips_returned` (number of IPs) | ✅ | ❌ |
| **hubble_drop_total** | Total dropped packet count | `source` or `destination`, `protocol`, `reason` | ✅ | ❌ |
| **hubble_tcp_flags_total** | Total TCP packets count by flag. | `source` or `destination`, `flag` | ✅ | ❌ |
| **hubble_flows_processed_total** | Total network flows processed (L4/L7 traffic) | `source` or `destination`, `protocol`, `verdict`, `type`, `subtype` | ✅ | ❌ |
4 changes: 2 additions & 2 deletions docs/03-Metrics/modes/modes.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
sidebar_position: 1
sidebar_position: 2
---
# Metric Modes
# Legacy Metric Modes

Retina provides **three modes** with their own metrics and scale capabilities.
Each mode is **fully customizable** (only create the metrics/labels you need).
Expand Down
Loading