Skip to content

[Dashboard] Envoy #6020

@grandwizard28

Description

@grandwizard28

Dashboard Name

Envoy Monitoring Dashboard

Expected Dashboard Sections and Panels

(Can be tweaked (add or remove panels/sections) according to available metrics)

General Overview

This section provides a high-level overview of Envoy's health and performance metrics, allowing for a quick assessment of the system's current state.

Panels

  • Active Connections

    • Description: Displays the number of active connections currently managed by Envoy.
  • Total Requests

    • Description: Shows the total number of requests handled by Envoy since the last restart.
  • Request Rate

    • Description: Illustrates the rate of incoming requests per second, helping to monitor traffic patterns.
  • Uptime

    • Description: Displays the total uptime of the Envoy instance since the last restart.

Request Metrics

This section focuses on metrics related to the handling of requests by Envoy, providing insights into traffic volume and request processing.

Panels

  • HTTP Request Count

    • Description: Shows the total number of HTTP requests processed by Envoy, categorized by method (GET, POST, etc.).
  • Request Size

    • Description: Displays the average and total size of incoming and outgoing requests, helping to monitor data throughput.
  • Response Size

    • Description: Illustrates the average and total size of responses sent by Envoy, aiding in bandwidth management.

Response Metrics

This section provides detailed metrics on the responses generated by Envoy, including status codes and response times.

Panels

  • HTTP Response Codes

    • Description: Displays the distribution of HTTP response codes (e.g., 200, 404, 500) returned by Envoy.
  • Response Time

    • Description: Shows the average and percentile response times for requests handled by Envoy, highlighting latency issues.
  • Response Time Histogram

    • Description: Illustrates the distribution of response times across different percentiles to identify performance bottlenecks.

Latency Metrics

This section monitors the latency of requests processed by Envoy, helping to ensure optimal performance and user experience.

Panels

  • Average Latency

    • Description: Displays the average time taken to process requests, measured in milliseconds.
  • Latency Percentiles

    • Description: Shows latency percentiles (50th, 95th, 99th) to identify outliers and performance issues.
  • Slowest Endpoints

    • Description: Lists the endpoints with the highest latency, aiding in targeted performance optimization.

Error Metrics

This section monitors errors and failures within Envoy operations, aiding in the troubleshooting and resolution of issues.

Panels

  • Total Errors

    • Description: Displays the total number of errors encountered by Envoy, including client and server errors.
  • Error Rate

    • Description: Shows the rate of errors per second, helping to identify spikes or trends in failures.
  • Top Error Types

    • Description: Lists the most common types of errors (e.g., timeout errors, connection failures) to prioritize troubleshooting efforts.

Resource Usage

This section provides insights into the resource consumption of Envoy, helping ensure it operates efficiently within the infrastructure.

Panels

  • CPU Usage

    • Description: Displays the CPU usage by the Envoy process, indicating the processing load.
  • Memory Usage

    • Description: Shows the memory consumption of Envoy, helping identify potential memory leaks or inefficiencies.
  • Disk I/O

    • Description: Monitors disk input/output operations performed by Envoy, relevant for environments with persistent storage.

Network I/O

This section provides insights into the network performance of Envoy, monitoring data throughput and connection metrics.

Panels

  • Network Throughput

    • Description: Displays incoming and outgoing network traffic to and from Envoy, measured in bytes per second.
  • Connection Rate

    • Description: Shows the rate of new connections being established with Envoy, indicating traffic patterns.
  • Rejected Connections

    • Description: Monitors the number of connection attempts rejected by Envoy, highlighting potential overloads.

Upstream and Downstream Metrics

This section tracks metrics related to upstream and downstream services, providing visibility into service interactions and dependencies.

Panels

  • Upstream Request Count

    • Description: Shows the number of requests forwarded to upstream services by Envoy.
  • Upstream Error Rate

    • Description: Displays the rate of errors returned by upstream services, indicating potential issues with dependencies.
  • Downstream Request Distribution

    • Description: Illustrates the distribution of requests from different downstream clients or services.

Expected Dashboard Variables

  • namespace – Filter metrics based on the Kubernetes namespace where Istio is deployed.
  • deployment.environment - Environment of application (configured at Otel agent level) eg: prod, staging
  • service.name – Select specific services within the mesh to filter metrics.
  • cluster – For multi-cluster setups, filter metrics based on the Kubernetes cluster.

References or Screenshots

📋 Notes

Please review the CONTRIBUTING.md for guidelines on dashboard structure, naming conventions, and how to submit a pull request.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions