Skip to content

A powerful Infrastructure-as-Code (IaC) tool for deploying and managing DataDog health checks, synthetic tests, and SLOs

License

Notifications You must be signed in to change notification settings

fleXRPL/datadog-healthcheck-deployer

Repository files navigation

datadog-healthcheck-deployer

versions

PyPI version Python Versions CI/CD Pipeline Python

health

Quality Gate Status Overall Coverage Security Rating Maintainability Rating Vulnerabilities Dependabot Status Code style: black

stats

Downloads License: MIT

A powerful Infrastructure-as-Code (IaC) tool for deploying and managing DataDog health checks, synthetic tests, and SLOs

Installation

pip install datadog-healthcheck-deployer

Quick Start

  1. Set up your DataDog credentials:
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"
  1. Create a health check configuration file healthcheck.yaml:
version: "1.0"
healthchecks:
  - name: "Basic HTTP Check"
    type: "http"
    url: "https://api.example.com/health"
    monitors:
      availability:
        enabled: true
        threshold: 99.9
  1. Deploy your health check:
dd-healthcheck deploy --file healthcheck.yaml

Usage

CLI Commands

# Deploy health checks
dd-healthcheck deploy --file <config-file>

# Validate configuration
dd-healthcheck validate --file <config-file>

# List existing health checks
dd-healthcheck list

# Delete health checks
dd-healthcheck delete --name <check-name>

Configuration Reference

See our Configuration Guide for detailed configuration options.

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Objective

Integrate with our existing monitor and dashboard deployers primarily for health check endpoint monitoring.

  • Health check endpoint monitoring
  • Synthetic API tests
  • Uptime monitoring
  • SSL certificate monitoring
  • DNS monitoring
  • Global availability checks

Project Structure

datadog-healthcheck-deployer/
├── examples
│   └── dashboards
│       ├── aws
│       ├── common
│       └── services
├── scripts
├── src
│   ├── datadog_healthcheck_deployer
│   │   ├── checks
│   │   ├── dashboards
│   │   ├── monitors
│   │   ├── utils
│   │   └── validators
└── tests
    └── unit
        ├── checks
        ├── dashboards
        ├── monitors
        ├── utils
        └── validators

Configuration Examples

Health Check Configuration

healthchecks:
  - name: "API Health Check"
    type: "http"
    url: "https://api.example.com/health"
    locations:
      - "aws:us-east-1"
      - "aws:eu-west-1"
      - "gcp:asia-east1"
    frequency: 60 # seconds
    timeout: 10
    success_criteria:
      - status_code: 200
      - response_time: 1000 # ms
    headers:
      X-API-Key: "{{API_KEY}}"
    monitors:
      availability:
        enabled: true
        threshold: 99.9
      latency:
        enabled: true
        threshold: 500
    slo:
      target: 99.95
      window: "30d"

Integration Example

from datadog_monitor_deployer.deployer import MonitorDeployer
from datadog_healthcheck_deployer.checks import HttpCheck
from datadog_healthcheck_deployer.slos import AvailabilitySLO

class HealthCheckDeployer:
    def __init__(self, api_key: str, app_key: str):
        self.monitor_deployer = MonitorDeployer(api_key, app_key)

    def deploy_health_check(self, config: dict):
        # Create the health check
        check = HttpCheck.from_config(config)
        check_id = check.create()

        # Create associated monitors
        monitors = self._create_monitors(check, config)

        # Create SLO if configured
        if 'slo' in config:
            slo = AvailabilitySLO(
                name=f"{config['name']} Availability",
                target=config['slo']['target'],
                monitors=monitors
            )
            slo.create()

Synthetic Test Configuration

synthetic_tests:
  - name: "Global API Availability"
    type: "api"
    request:
      method: "GET"
      url: "https://api.example.com/health"
      assertions:
        - type: "statusCode"
          operator: "is"
          target: 200
        - type: "responseTime"
          operator: "lessThan"
          target: 1000
    locations:
      - "aws:us-east-1"
      - "aws:eu-west-1"
      - "aws:ap-southeast-1"
      - "gcp:us-central1"
      - "azure:westeurope"
    frequency: 300
    retry:
      count: 2
      interval: 30

Future Features

  • Multi-step API checks
  • Custom assertion logic
  • Response body validation
  • SSL certificate expiration monitoring
  • DNS propagation checks
  • Global latency mapping
  • Automatic baseline creation
  • Anomaly detection
  • Integration with incident management systems

Benefits

The goal is to provide similar capabilities to Catchpoint while keeping everything within the DataDog ecosystem, which:

  • Reduces costs
  • Simplifies management
  • Provides better integration with existing monitoring
  • Enables unified alerting and reporting

About

A powerful Infrastructure-as-Code (IaC) tool for deploying and managing DataDog health checks, synthetic tests, and SLOs

Resources

License

Stars

Watchers

Forks

Packages

No packages published