Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions components/Dashboards/APMDashboardsListicle.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ const APMDashboardsData: IconCardData[] = [
},
{
name: 'Database Calls Monitoring',
href: 'https://github.com/SigNoz/dashboards/blob/main/apm/db-calls-monitoring.json',
href: '/docs/dashboards/dashboard-templates/db-calls-monitoring',
icon: <Database className="h-7 w-7 text-purple-600" />,
clickName: 'Database Calls Monitoring Dashboard Template',
},
{
name: 'HTTP API Monitoring',
href: 'https://github.com/SigNoz/dashboards/blob/main/apm/http-api-monitoring.json',
href: '/docs/dashboards/dashboard-templates/http-api-monitoring',
icon: <Globe className="h-7 w-7 text-green-600" />,
clickName: 'HTTP API Monitoring Dashboard Template',
},
Expand Down
8 changes: 4 additions & 4 deletions components/Dashboards/DashboardTemplatesListicle.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,13 @@ const DashboardTemplatesData: IconCardData[] = [
},
{
name: 'ArgoCD',
href: 'https://github.com/SigNoz/dashboards/tree/main/argocd',
href: '/docs/dashboards/dashboard-templates/argocd-dashboard',
icon: <GitBranch className="h-7 w-7 text-orange-500" />,
clickName: 'ArgoCD Dashboard Template',
},
{
name: 'AWS ElastiCache Redis',
href: 'https://github.com/SigNoz/dashboards/tree/main/aws-elasticache/redis',
href: '/docs/dashboards/dashboard-templates/aws-elasticache-redis',
icon: <Cloud className="h-7 w-7 text-orange-600" />,
clickName: 'AWS ElastiCache Redis Dashboard Template',
},
Expand All @@ -92,7 +92,7 @@ const DashboardTemplatesData: IconCardData[] = [
},
{
name: 'ClickHouse',
href: 'https://github.com/SigNoz/dashboards/tree/main/clickhouse',
href: '/docs/dashboards/dashboard-templates/clickhouse-monitoring',
icon: <SiClickhouse className="h-7 w-7 text-yellow-500" />,
clickName: 'ClickHouse Dashboard Template',
},
Expand All @@ -116,7 +116,7 @@ const DashboardTemplatesData: IconCardData[] = [
},
{
name: 'Flask Monitoring',
href: 'https://github.com/SigNoz/dashboards/tree/main/flask-monitoring',
href: '/docs/dashboards/dashboard-templates/flask-monitoring',
icon: <Globe className="h-7 w-7 text-black" />,
clickName: 'Flask Monitoring Dashboard Template',
},
Expand Down
121 changes: 121 additions & 0 deletions data/docs/dashboards/dashboard-templates/argocd-dashboard.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
date: 2025-01-03
id: argocd-dashboard
title: ArgoCD Dashboard
description: Monitor ArgoCD applications and operations with comprehensive metrics including application health, sync status, controller performance, cluster stats, and repository server activity.
---

This dashboard provides comprehensive monitoring of ArgoCD applications and operations, offering detailed visibility into application health, sync status, controller performance, cluster statistics, and repository server activity. It enables teams to effectively monitor their GitOps workflows and troubleshoot deployment issues.

<div className="flex justify-center">
<DashboardActions
dashboardJsonUrl="https://raw.githubusercontent.com/SigNoz/dashboards/refs/heads/main/argocd/argocd.json"
dashboardName="ArgoCD Dashboard"
/>
</div>

## What This Dashboard Monitors

This dashboard tracks essential ArgoCD metrics to help you:

- **Application Health Monitoring**: Track the health status of ArgoCD applications across different states
- **Sync Status Tracking**: Monitor application synchronization status and identify out-of-sync applications
- **Controller Performance**: Analyze ArgoCD controller performance and reconciliation activities
- **Cluster Statistics**: Monitor Kubernetes cluster resources and API activity
- **Repository Server Metrics**: Track Git repository interactions and fetch operations
- **Workqueue Analysis**: Monitor ArgoCD workqueue depth and processing efficiency

## Metrics Included

### Overview Section

#### Number Of Applications
- **Applications Graph**: Time-series visualization showing total number of applications by health status:
- Grouped by health status for trend analysis
- Filtered by namespace and repository for focused monitoring

#### Repository Servers
- **Repository Count**: Shows the total number of repository servers configured in ArgoCD

#### Applications By Repository
- **Repository Distribution**: Displays application count grouped by repository:
- Helps identify repository usage patterns
- Useful for capacity planning and load distribution

### Application Health Section

#### Health Status Metrics
- **Healthy**: Number of applications in healthy state (green indicator)
- **Progressing**: Number of applications currently in progress (blue indicator)
- **Suspended**: Number of suspended applications (red background indicator)
- **Degraded**: Number of degraded applications (red text indicator)

### Application Sync Status Section

#### Synchronization Tracking
- **Synced**: Graph showing applications that are in sync with their Git repositories
- Grouped by repository for detailed analysis
- Essential for ensuring deployments are up-to-date

- **OutOfSync**: Graph displaying applications that are out of sync
- Critical for identifying deployment drift
- Grouped by repository for targeted remediation

### Controller Stats Section

#### Sync and Reconciliation Activity
- **Sync Activity**: Shows different phases of sync operations:
- **Succeeded**: Successful sync operations
- **Failed**: Failed sync attempts
- **Error**: Sync operations that encountered errors

- **Reconciliation Activity**: Displays application reconciliation count over time
- Tracks controller workload and processing efficiency

#### Performance Metrics
- **Count of Application Reconciliation by Duration Bounds**: Histogram showing reconciliation performance distribution
- **Reconciliation Performance**: Detailed performance analysis by duration buckets
- **K8s API Activity**: Monitors Kubernetes API requests during application reconciliation:
- Grouped by verb (GET, POST, PUT, DELETE) and resource kind
- Essential for understanding API load patterns

- **Workqueue Depth**: Current depth of ArgoCD workqueue
- Indicates controller processing capacity and potential bottlenecks

### Cluster Stats Section

#### Cluster Resource Monitoring
- **Age of Cluster Cache**: Tracks cluster cache freshness in seconds
- Critical for ensuring up-to-date cluster state information

- **Count of Cluster Resource Objects**: Number of Kubernetes resource objects in the cache
- Grouped by server for multi-cluster environments

- **Count of Cluster Events**: Tracks processed Kubernetes events
- Important for monitoring cluster activity levels

- **Count of API Resources**: Number of monitored Kubernetes API resources
- Helps understand the scope of monitoring coverage

### Repo Server Stats Section

#### Git Repository Activity
- **Count of Git Ls-Remote Requests**: Tracks Git ls-remote operations by repository
- Important for monitoring repository connectivity and health

- **Count of Git Fetch Requests**: Shows Git fetch operations by repository
- Critical for understanding repository synchronization activity

## Dashboard Variables

This dashboard includes pre-configured variables for filtering:

- **health.status**: Filter by application health status (Healthy, Progressing, Suspended, Degraded, etc.)
- **namespace**: Filter by Kubernetes namespace for environment-specific monitoring
- **repo**: Filter by Git repository name for repository-specific analysis

## Related Dashboards

- [Key Operations](/docs/dashboards/dashboard-templates/key-operations)
- [Kubernetes Pod Metrics - Detailed](/docs/dashboards/dashboard-templates/kubernetes-pod-metrics-detailed)
- [Kubernetes Node Metrics - Detailed](/docs/dashboards/dashboard-templates/kubernetes-node-metrics-detailed)
141 changes: 141 additions & 0 deletions data/docs/dashboards/dashboard-templates/aws-elasticache-redis.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
date: 2025-01-03
id: aws-elasticache-redis
title: AWS ElastiCache Redis Dashboard
description: Monitor AWS ElastiCache Redis instances with comprehensive CloudWatch metrics including CPU utilization, memory usage, network traffic, cache performance, and replication metrics.
---

This dashboard provides comprehensive monitoring of AWS ElastiCache Redis instances using CloudWatch metrics. It offers detailed visibility into both host-level metrics (CPU, memory, network) and Redis engine-specific metrics (cache hits, evictions, replication lag) to help optimize cache performance and troubleshoot issues.

<div className="flex justify-center">
<DashboardActions
dashboardJsonUrl="https://raw.githubusercontent.com/SigNoz/dashboards/refs/heads/main/aws-elasticache/redis/overview.json"
dashboardName="AWS ElastiCache Redis Dashboard"
/>
</div>

## What This Dashboard Monitors

This dashboard tracks essential AWS ElastiCache Redis metrics to help you:

- **Performance Monitoring**: Track CPU utilization, memory usage, and cache hit rates
- **Capacity Planning**: Monitor memory usage percentages and capacity utilization
- **Network Analysis**: Analyze network traffic patterns and bandwidth utilization
- **Cache Efficiency**: Track cache hit rates, evictions, and memory fragmentation
- **Replication Health**: Monitor replication lag and connection metrics
- **Resource Optimization**: Identify performance bottlenecks and optimize configurations

## Metrics Included

### CPU and Engine Performance

#### CPU Utilization
- **Description**: Host-level CPU utilization percentage for the ElastiCache node
- **Use Case**: Monitor overall system performance and identify CPU bottlenecks
- **Grouping**: By cache cluster ID for multi-cluster monitoring

#### Engine CPU Utilization
- **Description**: Redis engine-specific CPU utilization percentage
- **Use Case**: Track Redis process CPU consumption separate from system overhead
- **Grouping**: By cache cluster ID to compare engine performance across clusters

### Memory Management

#### Database Memory Usage Percentage
- **Description**: Percentage of allocated memory currently used by Redis
- **Use Case**: Monitor memory consumption and plan for capacity needs
- **Critical Thresholds**: High values (>80%) may indicate need for scaling

#### Database Capacity Usage Percentage
- **Description**: Percentage of total available memory capacity in use
- **Use Case**: Track capacity utilization for scaling decisions
- **Planning**: Essential for understanding growth patterns

#### Memory Fragmentation Ratio
- **Description**: Ratio of memory allocated by Redis vs. memory used by the OS
- **Use Case**: Identify memory fragmentation issues that can impact performance
- **Optimization**: Values significantly above 1.0 may indicate fragmentation

#### Freeable Memory
- **Description**: Amount of memory available for allocation (in bytes)
- **Use Case**: Monitor available memory before hitting limits
- **Unit**: Displayed in decimal bytes for precise capacity tracking

### Cache Performance

#### Cache Hit Rate
- **Description**: Number of successful cache lookups (cache hits)
- **Use Case**: Measure cache effectiveness and application performance
- **Optimization**: Higher hit rates indicate better cache utilization

#### Evictions
- **Description**: Number of items evicted from cache due to memory pressure
- **Use Case**: Monitor memory pressure and optimize cache policies
- **Troubleshooting**: High eviction rates may indicate insufficient memory

### Network Activity

#### Network Bytes In
- **Description**: Number of bytes received by the ElastiCache node
- **Use Case**: Monitor inbound network traffic and bandwidth utilization
- **Unit**: Displayed in decimal bytes for traffic analysis

#### Network Bytes Out
- **Description**: Number of bytes sent from the ElastiCache node
- **Use Case**: Track outbound network traffic and response data volume
- **Capacity Planning**: Essential for understanding network requirements

### Connection and System Metrics

#### Current Connections
- **Description**: Number of active client connections to the Redis instance
- **Use Case**: Monitor connection usage and identify connection leaks
- **Capacity Planning**: Track connection patterns for scaling decisions

#### Swap Usage
- **Description**: Amount of swap space used by the ElastiCache node
- **Use Case**: Identify memory pressure that forces swapping
- **Performance Impact**: High swap usage can significantly degrade performance

#### Replication Lag
- **Description**: Maximum lag time between master and replica nodes (in seconds)
- **Use Case**: Monitor replication health in Redis clusters
- **Reliability**: Essential for ensuring data consistency across replicas

## Dashboard Variables

This dashboard includes filtering capabilities:

- **cache_cluster_id**: Filter metrics by specific ElastiCache cluster ID
- **Multi-select**: Monitor multiple clusters simultaneously
- **All option**: View aggregate metrics across all clusters
- **Dynamic**: Automatically populates from available cluster IDs

## Monitoring Best Practices

### Performance Optimization
- Monitor **Cache Hit Rate** to ensure efficient cache utilization
- Track **CPU Utilization** metrics to identify processing bottlenecks
- Watch **Memory Fragmentation Ratio** for memory efficiency

### Capacity Planning
- Use **Database Memory Usage Percentage** for scaling decisions
- Monitor **Freeable Memory** to prevent out-of-memory conditions
- Track **Network Bytes** for bandwidth planning

### Troubleshooting
- High **Evictions** may indicate insufficient memory allocation
- Elevated **Swap Usage** suggests memory pressure
- Increased **Replication Lag** indicates replication issues

### Alerting Recommendations
- **CPU Utilization** > 80% - Consider scaling or optimization
- **Memory Usage** > 85% - Plan for memory scaling
- **Cache Hit Rate** < 90% - Review cache strategy
- **Replication Lag** > 5 seconds - Investigate replication health

## Related Dashboards

- [AWS RDS](/docs/dashboards/dashboard-templates/aws-rds)
- [Redis](/docs/dashboards/dashboard-templates/redis)
- [Key Operations](/docs/dashboards/dashboard-templates/key-operations)
Loading