Skip to content

Add AWS MSK Cluster Monitoring Dashboard (OTLP v1)#258

Open
blessuselessk wants to merge 1 commit intoSigNoz:mainfrom
blessuselessk:feat/aws-msk-dashboard
Open

Add AWS MSK Cluster Monitoring Dashboard (OTLP v1)#258
blessuselessk wants to merge 1 commit intoSigNoz:mainfrom
blessuselessk:feat/aws-msk-dashboard

Conversation

@blessuselessk
Copy link

Summary

  • Adds a comprehensive AWS MSK (Managed Streaming for Apache Kafka) cluster monitoring dashboard using CloudWatch metrics collected via OpenTelemetry
  • Uses the SigNoz Query Builder (not PromQL) with aws_kafka_* metric names matching the CloudWatch AWS/Kafka namespace convention
  • 24 panels organized into 5 collapsible sections: Broker Metrics, Topic Metrics, Partition Metrics, Consumer Metrics, and AWS CloudWatch Metrics
  • Dashboard variables for cluster_name, broker_id (multi-select), and deployment.environment

Panels

Broker Metrics (8 panels): CPU User/System, Memory Used/Free, Disk Usage (Data Logs), Disk Throughput (Read/Write), Network RX/TX Packets

Topic Metrics (4 panels): Bytes In/Out Per Second, Messages In Per Second, Produce Total Time Mean

Partition Metrics (5 panels): Under-Replicated Partitions, Offline Partitions Count, Active Controller Count, Partition Count, Leader Count

Consumer Metrics (2 panels): Consumer Lag (Sum Offset Lag), Estimated Max Time Lag

AWS Metrics (5 panels): Burst Balance, CPU Credit Usage, Traffic Shaping, Connection Count, Heap Memory After GC

Key Differentiators from Other PRs

  • Uses Query Builder (preferred per CONTRIBUTING.md) instead of PromQL
  • Uses correct aws_kafka_* CloudWatch metric naming (matching ElastiCache/RDS pattern in this repo), not JMX/Prometheus metrics
  • Follows the exact schema of recently-merged dashboards (panelMap sections, row panels, v4 format)
  • All metric names verified against official AWS MSK CloudWatch documentation
  • Includes complete OTel collector configuration in README

Fixes SigNoz/signoz#6036

Test plan

  • Import dashboard JSON into SigNoz instance
  • Verify JSON parses correctly (validated with python3 -m json.tool)
  • Confirm all 29 layout entries match 29 widget IDs
  • Confirm all 24 graph panels are grouped under their respective panelMap sections
  • Verify dashboard variables populate when aws_kafka_* metrics are present
  • Verify panel queries render correctly with CloudWatch MSK metrics

CloudWatch metrics via OpenTelemetry Query Builder with 5 sections:
- Broker Metrics (CPU, memory, disk, network)
- Topic Metrics (bytes in/out, messages, produce latency)
- Partition Metrics (under-replicated, offline, controller, counts)
- Consumer Metrics (offset lag, estimated time lag)
- AWS Metrics (burst balance, CPU credits, traffic shaping, connections, heap)

Fixes SigNoz/signoz#6036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Dashboard Request] AWS MSK Cluster

1 participant