Skip to content

Commit da8a747

Browse files
authored
feat: add HTTP API Monitoring, ArgoCD, and AWS ElastiCache Redis dashboard documentation (#1826)
* feat: add HTTP API Monitoring dashboard documentation - Add comprehensive documentation for HTTP API Monitoring dashboard - Include detailed descriptions of all metrics and use cases - Follow existing documentation structure and formatting - Built on OpenTelemetry HTTP attributes for standardized monitoring * refactor: remove use cases, best practices sections and image placeholder - Remove use cases section as requested - Remove best practices section as requested - Remove dashboard image placeholder since image is not available - Keep core metrics documentation and related dashboards section * fix: update APM dashboard links to documentation pages - Update HTTP API Monitoring href to point to new documentation page - Fix Database Calls Monitoring href to point to documentation page for consistency - All APM dashboards now link to their respective documentation pages * feat: add ArgoCD dashboard documentation - Add comprehensive documentation for ArgoCD monitoring dashboard - Include detailed descriptions of all monitoring sections and metrics - Cover application health, sync status, controller stats, cluster stats, and repo server metrics - Follow existing documentation structure and formatting - Provide filtering variables and related dashboards * fix: update ArgoCD dashboard link to documentation page - Update ArgoCD href from GitHub repository to documentation page - Point to /docs/dashboards/dashboard-templates/argocd-dashboard - Maintain consistency with other documented dashboard templates * feat: add AWS ElastiCache Redis dashboard documentation - Add comprehensive documentation for AWS ElastiCache Redis monitoring dashboard - Include detailed descriptions of all CloudWatch metrics and widgets - Cover CPU, memory, network, cache performance, and replication metrics - Provide monitoring best practices and alerting recommendations - Follow existing documentation structure and formatting * fix: update AWS ElastiCache Redis link to documentation page - Update AWS ElastiCache Redis href from GitHub repository to documentation page - Point to /docs/dashboards/dashboard-templates/aws-elasticache-redis - Maintain consistency with other documented dashboard templates * feat: add AWS SQS Prometheus dashboard documentation - Add comprehensive documentation for AWS SQS Prometheus monitoring dashboard - Include detailed descriptions of queue message state metrics and monitoring - Cover visible, delayed, and in-flight message tracking with third-party exporter - Provide setup guidance for SQS Prometheus exporter integration - Include monitoring best practices and alerting recommendations - Follow existing documentation structure and formatting * feat: add ClickHouse monitoring dashboard documentation * feat: update ClickHouse dashboard link to documentation page * feat: simplify ClickHouse dashboard documentation structure * feat: remove best practices section from AWS SQS Prometheus dashboard * feat: add Flask monitoring dashboard documentation * feat: update Flask Monitoring dashboard link to documentation page
1 parent 36d9269 commit da8a747

File tree

8 files changed

+704
-5
lines changed

8 files changed

+704
-5
lines changed

components/Dashboards/APMDashboardsListicle.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ const APMDashboardsData: IconCardData[] = [
2626
},
2727
{
2828
name: 'HTTP API Monitoring',
29-
href: 'https://github.com/SigNoz/dashboards/blob/main/apm/http-api-monitoring.json',
29+
href: '/docs/dashboards/dashboard-templates/http-api-monitoring',
3030
icon: <Globe className="h-7 w-7 text-green-600" />,
3131
clickName: 'HTTP API Monitoring Dashboard Template',
3232
},

components/Dashboards/DashboardTemplatesListicle.tsx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,13 @@ const DashboardTemplatesData: IconCardData[] = [
7979
},
8080
{
8181
name: 'ArgoCD',
82-
href: 'https://github.com/SigNoz/dashboards/tree/main/argocd',
82+
href: '/docs/dashboards/dashboard-templates/argocd-dashboard',
8383
icon: <GitBranch className="h-7 w-7 text-orange-500" />,
8484
clickName: 'ArgoCD Dashboard Template',
8585
},
8686
{
8787
name: 'AWS ElastiCache Redis',
88-
href: 'https://github.com/SigNoz/dashboards/tree/main/aws-elasticache/redis',
88+
href: '/docs/dashboards/dashboard-templates/aws-elasticache-redis',
8989
icon: <Cloud className="h-7 w-7 text-orange-600" />,
9090
clickName: 'AWS ElastiCache Redis Dashboard Template',
9191
},
@@ -121,7 +121,7 @@ const DashboardTemplatesData: IconCardData[] = [
121121
},
122122
{
123123
name: 'ClickHouse',
124-
href: 'https://github.com/SigNoz/dashboards/tree/main/clickhouse',
124+
href: '/docs/dashboards/dashboard-templates/clickhouse-monitoring',
125125
icon: <SiClickhouse className="h-7 w-7 text-yellow-500" />,
126126
clickName: 'ClickHouse Dashboard Template',
127127
},
@@ -157,7 +157,7 @@ const DashboardTemplatesData: IconCardData[] = [
157157
},
158158
{
159159
name: 'Flask Monitoring',
160-
href: 'https://github.com/SigNoz/dashboards/tree/main/flask-monitoring',
160+
href: '/docs/dashboards/dashboard-templates/flask-monitoring',
161161
icon: <Globe className="h-7 w-7 text-black" />,
162162
clickName: 'Flask Monitoring Dashboard Template',
163163
},
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
date: 2025-01-03
3+
id: argocd-dashboard
4+
title: ArgoCD Dashboard
5+
description: Monitor ArgoCD applications and operations with comprehensive metrics including application health, sync status, controller performance, cluster stats, and repository server activity.
6+
---
7+
8+
This dashboard provides comprehensive monitoring of ArgoCD applications and operations, offering detailed visibility into application health, sync status, controller performance, cluster statistics, and repository server activity. It enables teams to effectively monitor their GitOps workflows and troubleshoot deployment issues.
9+
10+
<div className="flex justify-center">
11+
<DashboardActions
12+
dashboardJsonUrl="https://raw.githubusercontent.com/SigNoz/dashboards/refs/heads/main/argocd/argocd.json"
13+
dashboardName="ArgoCD Dashboard"
14+
/>
15+
</div>
16+
17+
## What This Dashboard Monitors
18+
19+
This dashboard tracks essential ArgoCD metrics to help you:
20+
21+
- **Application Health Monitoring**: Track the health status of ArgoCD applications across different states
22+
- **Sync Status Tracking**: Monitor application synchronization status and identify out-of-sync applications
23+
- **Controller Performance**: Analyze ArgoCD controller performance and reconciliation activities
24+
- **Cluster Statistics**: Monitor Kubernetes cluster resources and API activity
25+
- **Repository Server Metrics**: Track Git repository interactions and fetch operations
26+
- **Workqueue Analysis**: Monitor ArgoCD workqueue depth and processing efficiency
27+
28+
## Metrics Included
29+
30+
### Overview Section
31+
32+
#### Number Of Applications
33+
- **Applications Graph**: Time-series visualization showing total number of applications by health status:
34+
- Grouped by health status for trend analysis
35+
- Filtered by namespace and repository for focused monitoring
36+
37+
#### Repository Servers
38+
- **Repository Count**: Shows the total number of repository servers configured in ArgoCD
39+
40+
#### Applications By Repository
41+
- **Repository Distribution**: Displays application count grouped by repository:
42+
- Helps identify repository usage patterns
43+
- Useful for capacity planning and load distribution
44+
45+
### Application Health Section
46+
47+
#### Health Status Metrics
48+
- **Healthy**: Number of applications in healthy state (green indicator)
49+
- **Progressing**: Number of applications currently in progress (blue indicator)
50+
- **Suspended**: Number of suspended applications (red background indicator)
51+
- **Degraded**: Number of degraded applications (red text indicator)
52+
53+
### Application Sync Status Section
54+
55+
#### Synchronization Tracking
56+
- **Synced**: Graph showing applications that are in sync with their Git repositories
57+
- Grouped by repository for detailed analysis
58+
- Essential for ensuring deployments are up-to-date
59+
60+
- **OutOfSync**: Graph displaying applications that are out of sync
61+
- Critical for identifying deployment drift
62+
- Grouped by repository for targeted remediation
63+
64+
### Controller Stats Section
65+
66+
#### Sync and Reconciliation Activity
67+
- **Sync Activity**: Shows different phases of sync operations:
68+
- **Succeeded**: Successful sync operations
69+
- **Failed**: Failed sync attempts
70+
- **Error**: Sync operations that encountered errors
71+
72+
- **Reconciliation Activity**: Displays application reconciliation count over time
73+
- Tracks controller workload and processing efficiency
74+
75+
#### Performance Metrics
76+
- **Count of Application Reconciliation by Duration Bounds**: Histogram showing reconciliation performance distribution
77+
- **Reconciliation Performance**: Detailed performance analysis by duration buckets
78+
- **K8s API Activity**: Monitors Kubernetes API requests during application reconciliation:
79+
- Grouped by verb (GET, POST, PUT, DELETE) and resource kind
80+
- Essential for understanding API load patterns
81+
82+
- **Workqueue Depth**: Current depth of ArgoCD workqueue
83+
- Indicates controller processing capacity and potential bottlenecks
84+
85+
### Cluster Stats Section
86+
87+
#### Cluster Resource Monitoring
88+
- **Age of Cluster Cache**: Tracks cluster cache freshness in seconds
89+
- Critical for ensuring up-to-date cluster state information
90+
91+
- **Count of Cluster Resource Objects**: Number of Kubernetes resource objects in the cache
92+
- Grouped by server for multi-cluster environments
93+
94+
- **Count of Cluster Events**: Tracks processed Kubernetes events
95+
- Important for monitoring cluster activity levels
96+
97+
- **Count of API Resources**: Number of monitored Kubernetes API resources
98+
- Helps understand the scope of monitoring coverage
99+
100+
### Repo Server Stats Section
101+
102+
#### Git Repository Activity
103+
- **Count of Git Ls-Remote Requests**: Tracks Git ls-remote operations by repository
104+
- Important for monitoring repository connectivity and health
105+
106+
- **Count of Git Fetch Requests**: Shows Git fetch operations by repository
107+
- Critical for understanding repository synchronization activity
108+
109+
## Dashboard Variables
110+
111+
This dashboard includes pre-configured variables for filtering:
112+
113+
- **health.status**: Filter by application health status (Healthy, Progressing, Suspended, Degraded, etc.)
114+
- **namespace**: Filter by Kubernetes namespace for environment-specific monitoring
115+
- **repo**: Filter by Git repository name for repository-specific analysis
116+
117+
## Related Dashboards
118+
119+
- [Key Operations](/docs/dashboards/dashboard-templates/key-operations)
120+
- [Kubernetes Pod Metrics - Detailed](/docs/dashboards/dashboard-templates/kubernetes-pod-metrics-detailed)
121+
- [Kubernetes Node Metrics - Detailed](/docs/dashboards/dashboard-templates/kubernetes-node-metrics-detailed)
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
date: 2025-01-03
3+
id: aws-elasticache-redis
4+
title: AWS ElastiCache Redis Dashboard
5+
description: Monitor AWS ElastiCache Redis instances with comprehensive CloudWatch metrics including CPU utilization, memory usage, network traffic, cache performance, and replication metrics.
6+
---
7+
8+
This dashboard provides comprehensive monitoring of AWS ElastiCache Redis instances using CloudWatch metrics. It offers detailed visibility into both host-level metrics (CPU, memory, network) and Redis engine-specific metrics (cache hits, evictions, replication lag) to help optimize cache performance and troubleshoot issues.
9+
10+
<div className="flex justify-center">
11+
<DashboardActions
12+
dashboardJsonUrl="https://raw.githubusercontent.com/SigNoz/dashboards/refs/heads/main/aws-elasticache/redis/overview.json"
13+
dashboardName="AWS ElastiCache Redis Dashboard"
14+
/>
15+
</div>
16+
17+
## What This Dashboard Monitors
18+
19+
This dashboard tracks essential AWS ElastiCache Redis metrics to help you:
20+
21+
- **Performance Monitoring**: Track CPU utilization, memory usage, and cache hit rates
22+
- **Capacity Planning**: Monitor memory usage percentages and capacity utilization
23+
- **Network Analysis**: Analyze network traffic patterns and bandwidth utilization
24+
- **Cache Efficiency**: Track cache hit rates, evictions, and memory fragmentation
25+
- **Replication Health**: Monitor replication lag and connection metrics
26+
- **Resource Optimization**: Identify performance bottlenecks and optimize configurations
27+
28+
## Metrics Included
29+
30+
### CPU and Engine Performance
31+
32+
#### CPU Utilization
33+
- **Description**: Host-level CPU utilization percentage for the ElastiCache node
34+
- **Use Case**: Monitor overall system performance and identify CPU bottlenecks
35+
- **Grouping**: By cache cluster ID for multi-cluster monitoring
36+
37+
#### Engine CPU Utilization
38+
- **Description**: Redis engine-specific CPU utilization percentage
39+
- **Use Case**: Track Redis process CPU consumption separate from system overhead
40+
- **Grouping**: By cache cluster ID to compare engine performance across clusters
41+
42+
### Memory Management
43+
44+
#### Database Memory Usage Percentage
45+
- **Description**: Percentage of allocated memory currently used by Redis
46+
- **Use Case**: Monitor memory consumption and plan for capacity needs
47+
- **Critical Thresholds**: High values (>80%) may indicate need for scaling
48+
49+
#### Database Capacity Usage Percentage
50+
- **Description**: Percentage of total available memory capacity in use
51+
- **Use Case**: Track capacity utilization for scaling decisions
52+
- **Planning**: Essential for understanding growth patterns
53+
54+
#### Memory Fragmentation Ratio
55+
- **Description**: Ratio of memory allocated by Redis vs. memory used by the OS
56+
- **Use Case**: Identify memory fragmentation issues that can impact performance
57+
- **Optimization**: Values significantly above 1.0 may indicate fragmentation
58+
59+
#### Freeable Memory
60+
- **Description**: Amount of memory available for allocation (in bytes)
61+
- **Use Case**: Monitor available memory before hitting limits
62+
- **Unit**: Displayed in decimal bytes for precise capacity tracking
63+
64+
### Cache Performance
65+
66+
#### Cache Hit Rate
67+
- **Description**: Number of successful cache lookups (cache hits)
68+
- **Use Case**: Measure cache effectiveness and application performance
69+
- **Optimization**: Higher hit rates indicate better cache utilization
70+
71+
#### Evictions
72+
- **Description**: Number of items evicted from cache due to memory pressure
73+
- **Use Case**: Monitor memory pressure and optimize cache policies
74+
- **Troubleshooting**: High eviction rates may indicate insufficient memory
75+
76+
### Network Activity
77+
78+
#### Network Bytes In
79+
- **Description**: Number of bytes received by the ElastiCache node
80+
- **Use Case**: Monitor inbound network traffic and bandwidth utilization
81+
- **Unit**: Displayed in decimal bytes for traffic analysis
82+
83+
#### Network Bytes Out
84+
- **Description**: Number of bytes sent from the ElastiCache node
85+
- **Use Case**: Track outbound network traffic and response data volume
86+
- **Capacity Planning**: Essential for understanding network requirements
87+
88+
### Connection and System Metrics
89+
90+
#### Current Connections
91+
- **Description**: Number of active client connections to the Redis instance
92+
- **Use Case**: Monitor connection usage and identify connection leaks
93+
- **Capacity Planning**: Track connection patterns for scaling decisions
94+
95+
#### Swap Usage
96+
- **Description**: Amount of swap space used by the ElastiCache node
97+
- **Use Case**: Identify memory pressure that forces swapping
98+
- **Performance Impact**: High swap usage can significantly degrade performance
99+
100+
#### Replication Lag
101+
- **Description**: Maximum lag time between master and replica nodes (in seconds)
102+
- **Use Case**: Monitor replication health in Redis clusters
103+
- **Reliability**: Essential for ensuring data consistency across replicas
104+
105+
## Dashboard Variables
106+
107+
This dashboard includes filtering capabilities:
108+
109+
- **cache_cluster_id**: Filter metrics by specific ElastiCache cluster ID
110+
- **Multi-select**: Monitor multiple clusters simultaneously
111+
- **All option**: View aggregate metrics across all clusters
112+
- **Dynamic**: Automatically populates from available cluster IDs
113+
114+
## Monitoring Best Practices
115+
116+
### Performance Optimization
117+
- Monitor **Cache Hit Rate** to ensure efficient cache utilization
118+
- Track **CPU Utilization** metrics to identify processing bottlenecks
119+
- Watch **Memory Fragmentation Ratio** for memory efficiency
120+
121+
### Capacity Planning
122+
- Use **Database Memory Usage Percentage** for scaling decisions
123+
- Monitor **Freeable Memory** to prevent out-of-memory conditions
124+
- Track **Network Bytes** for bandwidth planning
125+
126+
### Troubleshooting
127+
- High **Evictions** may indicate insufficient memory allocation
128+
- Elevated **Swap Usage** suggests memory pressure
129+
- Increased **Replication Lag** indicates replication issues
130+
131+
### Alerting Recommendations
132+
- **CPU Utilization** > 80% - Consider scaling or optimization
133+
- **Memory Usage** > 85% - Plan for memory scaling
134+
- **Cache Hit Rate** < 90% - Review cache strategy
135+
- **Replication Lag** > 5 seconds - Investigate replication health
136+
137+
## Related Dashboards
138+
139+
- [AWS RDS](/docs/dashboards/dashboard-templates/aws-rds)
140+
- [Redis](/docs/dashboards/dashboard-templates/redis)
141+
- [Key Operations](/docs/dashboards/dashboard-templates/key-operations)

0 commit comments

Comments
 (0)