A powerful Infrastructure-as-Code (IaC) tool for deploying and managing DataDog health checks, synthetic tests, and SLOs
pip install datadog-healthcheck-deployer
- Set up your DataDog credentials:
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"
- Create a health check configuration file
healthcheck.yaml
:
version: "1.0"
healthchecks:
- name: "Basic HTTP Check"
type: "http"
url: "https://api.example.com/health"
monitors:
availability:
enabled: true
threshold: 99.9
- Deploy your health check:
dd-healthcheck deploy --file healthcheck.yaml
# Deploy health checks
dd-healthcheck deploy --file <config-file>
# Validate configuration
dd-healthcheck validate --file <config-file>
# List existing health checks
dd-healthcheck list
# Delete health checks
dd-healthcheck delete --name <check-name>
See our Configuration Guide for detailed configuration options.
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.
Integrate with our existing monitor and dashboard deployers primarily for health check endpoint monitoring.
- Health check endpoint monitoring
- Synthetic API tests
- Uptime monitoring
- SSL certificate monitoring
- DNS monitoring
- Global availability checks
datadog-healthcheck-deployer/
├── examples
│ └── dashboards
│ ├── aws
│ ├── common
│ └── services
├── scripts
├── src
│ ├── datadog_healthcheck_deployer
│ │ ├── checks
│ │ ├── dashboards
│ │ ├── monitors
│ │ ├── utils
│ │ └── validators
└── tests
└── unit
├── checks
├── dashboards
├── monitors
├── utils
└── validators
healthchecks:
- name: "API Health Check"
type: "http"
url: "https://api.example.com/health"
locations:
- "aws:us-east-1"
- "aws:eu-west-1"
- "gcp:asia-east1"
frequency: 60 # seconds
timeout: 10
success_criteria:
- status_code: 200
- response_time: 1000 # ms
headers:
X-API-Key: "{{API_KEY}}"
monitors:
availability:
enabled: true
threshold: 99.9
latency:
enabled: true
threshold: 500
slo:
target: 99.95
window: "30d"
from datadog_monitor_deployer.deployer import MonitorDeployer
from datadog_healthcheck_deployer.checks import HttpCheck
from datadog_healthcheck_deployer.slos import AvailabilitySLO
class HealthCheckDeployer:
def __init__(self, api_key: str, app_key: str):
self.monitor_deployer = MonitorDeployer(api_key, app_key)
def deploy_health_check(self, config: dict):
# Create the health check
check = HttpCheck.from_config(config)
check_id = check.create()
# Create associated monitors
monitors = self._create_monitors(check, config)
# Create SLO if configured
if 'slo' in config:
slo = AvailabilitySLO(
name=f"{config['name']} Availability",
target=config['slo']['target'],
monitors=monitors
)
slo.create()
synthetic_tests:
- name: "Global API Availability"
type: "api"
request:
method: "GET"
url: "https://api.example.com/health"
assertions:
- type: "statusCode"
operator: "is"
target: 200
- type: "responseTime"
operator: "lessThan"
target: 1000
locations:
- "aws:us-east-1"
- "aws:eu-west-1"
- "aws:ap-southeast-1"
- "gcp:us-central1"
- "azure:westeurope"
frequency: 300
retry:
count: 2
interval: 30
- Multi-step API checks
- Custom assertion logic
- Response body validation
- SSL certificate expiration monitoring
- DNS propagation checks
- Global latency mapping
- Automatic baseline creation
- Anomaly detection
- Integration with incident management systems
The goal is to provide similar capabilities to Catchpoint while keeping everything within the DataDog ecosystem, which:
- Reduces costs
- Simplifies management
- Provides better integration with existing monitoring
- Enables unified alerting and reporting