┌─────────────────────────────────────────────────────────────────┐
│ Internal Network (Private) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Service 1 │ │ Service 2 │ │ Service N │ │
│ │ │ │ │ │ │ │
│ │ • API │ │ • Database │ │ • Worker │ │
│ │ • Web App │ │ • Queue │ │ • Job │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌────▼────────────────────────▼────────────────────▼────┐ │
│ │ Heartbeat Clients (cron/systemd) │ │
│ │ │ │
│ │ • Lightweight scripts (Bash/Python/Node.js) │ │
│ │ • Send POST requests every 2 minutes │ │
│ │ • Include metadata (optional) │ │
│ └───────────────────────────┬───────────────────────────┘ │
│ │ │
└─────────────────────────────────┼───────────────────────────────┘
│
Outbound HTTPS (Port 443)
Only connection needed!
│
▼
┌───────────────────────────────────┐
│ Internet (Public) │
│ │
│ Cloudflare Global Network │
└───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ Cloudflare Worker │
│ (heartbeat-monitor.workers.dev) │
│ │
│ Endpoints: │
│ • POST /api/heartbeat (receive) │
│ • GET /api/status (current) │
│ • GET /api/logs (history) │
│ • GET / (dashboard) │
│ │
│ Scheduled Tasks: │
│ • Check staleness (every 5 min) │
│ • Update service status │
└────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────┐
│ Cloudflare KV Storage │
│ │
│ • monitor:latest │
│ (heartbeat timestamps) │
│ • monitor:data │
│ (summary + uptime stats) │
│ • recent:alerts │
│ (alert history) │
└───────────────────────────────────┘
│
▼
┌───────────────────────────────────┐
│ Dashboard Users │
│ │
│ Access via browser: │
│ https://your-worker.workers.dev │
└───────────────────────────────────┘
Internal Service → Heartbeat Client → POST /api/heartbeat → Worker
↓
KV Storage
↓
Store heartbeat data
Update latest timestamp
Payload:
{
"serviceId": "service-1",
"status": "up",
"metadata": { "hostname": "server-1" },
"message": "Heartbeat from server-1"
}Cloudflare Cron Trigger → Worker scheduled() function
↓
Read monitor:latest from KV
Read monitor:data from KV
↓
Calculate time since last heartbeat
↓
Compare with stalenessThreshold
↓
Determine status (up/down/unknown)
↓
Update uptime statistics
↓
Store updated monitor:data in KV
User Browser → GET / → Worker
↓
Read monitor:latest from KV
Read monitor:data from KV
↓
Embed data into HTML
↓
Return dashboard with embedded data
↓
(Optional) JavaScript polls /api/alerts/recent
↓
Auto-refresh configurable (default: disabled)
- Security: No inbound connections to internal services
- Simplicity: No VPN, tunnels, or complex networking
- Firewall Friendly: Works through corporate firewalls (outbound HTTPS only)
- Scalability: Easy to add new services
- Global Edge Network: Low latency worldwide
- Serverless: No servers to manage
- Free Tier: 100,000 requests/day free
- KV Storage: Fast, distributed key-value store
- Cron Triggers: Built-in scheduling
- Fast: Edge-cached, low latency
- Distributed: Global replication
- Simple: Key-value interface
- Cost-Effective: Free tier sufficient for most use cases
- Durable: Reliable storage
The system uses two separate KV keys (monitor:latest and monitor:data) to prevent race conditions:
Problem: When both heartbeat updates and cron checks write to the same key, they can overwrite each other's changes due to KV's eventual consistency model.
Solution: Separate concerns:
- Heartbeats ONLY update
monitor:latest(timestamps) - Cron ONLY updates
monitor:data(summary + uptime) - Dashboard reads both keys
Benefits:
- No race conditions: Updates don't conflict
- Smaller heartbeat writes: Only timestamps, not full statistics
- Consistent status: Cron-generated summaries are never overwritten
- Better performance: Reduced payload sizes for frequent operations
Purpose: Send periodic health signals to the worker
Features:
- Lightweight (single HTTP request)
- Customizable metadata
- Error handling
- Logging
Scheduling Options:
- Cron (simple, traditional)
- systemd timer (modern, reliable)
- Docker (containerized)
Purpose: Receive heartbeats, check staleness, serve dashboard
Responsibilities:
- Validate incoming heartbeats
- Authenticate via API keys
- Store heartbeat data in KV
- Check for stale services (scheduled)
- Serve dashboard and API endpoints
Routes:
POST /api/heartbeat- Receive heartbeat from servicesGET /api/status- Get current status summaryGET /api/logs?serviceId=X- Get historical logsGET /api/services- List configured servicesGET /- Dashboard UI
Purpose: Persist heartbeat data and service status
Keys:
-
monitor:latest- Latest heartbeat timestamps for all services (JSON object:{serviceId: timestamp})- Updated by: Heartbeat handler
- Read by: Cron checks, Dashboard
-
monitor:data- Service status and uptime statistics (JSON object)- Contains:
summary(current status) anduptime(daily statistics per service) - Updated by: Cron scheduled task
- Read by: Dashboard, API endpoints
- Contains:
-
recent:alerts- Dashboard alert history (JSON array)- Contains: External alerts and service status change notifications
- Updated by: Alert handlers, Service monitoring
- Configurable retention (default: 100 alerts, 7 days)
Data Retention:
- Latest timestamps: All enabled services (live data)
- Uptime statistics: Configurable (default: 120 days per service)
- Alert history: Configurable (default: 100 alerts or 7 days)
Purpose: Visual monitoring interface
Features:
- Real-time status display
- Summary cards (total, up, down, unknown)
- Per-service details
- Auto-refresh (30s)
- Responsive design
- No authentication (by default)
Heartbeat Interval: 2-5 minutes (120-300 seconds)
Staleness Threshold: 5-10 minutes (300-600 seconds)
Staleness Check: 10 minutes (cron)
Dashboard Refresh: Manual or configurable auto-refresh
Alert Polling: 10-60 seconds (if enabled)
- 2-5 minute heartbeats: Balance between freshness and KV operation costs
- 5-10 minute threshold: Allows 2 missed heartbeats before alerting
- 10-minute staleness check: Efficient detection with minimal KV operations
- Manual dashboard refresh: Embedded data eliminates need for auto-refresh
- Alert polling: Only if real-time notifications are needed
You can adjust these based on your needs:
- Critical services: 30s heartbeat, 2m threshold, 1m check
- Standard services: 2m heartbeat, 5m threshold, 5m check
- Low-priority: 10m heartbeat, 30m threshold, 15m check
Heartbeat Client
↓
Include Authorization: Bearer {apiKey}
↓
POST /api/heartbeat
↓
Worker validates:
1. serviceId exists in services.json
2. apiKey matches (if configured)
↓
Accept or reject request
- API Key Authentication: Per-service keys
- HTTPS Only: All communication encrypted
- Cloudflare Network: DDoS protection
- No Credentials Stored: Services don't need to store anything sensitive
- Outbound Only: No inbound firewall rules needed
- Services: ~100-500 (KV write limits and processing time)
- Heartbeat Frequency: 2-10 minutes recommended
- Storage: Minimal (2 primary KV entries + alert history)
- Requests: 100,000/day (free tier)
- KV Operations: Primary constraint (1000 writes/day on free tier)
If you need more:
- Workers Paid: $5/month for 10M requests
- KV Paid: $0.50/GB storage
- Multiple Workers: Split services across workers
- Reduce heartbeat frequency for non-critical services
- Clean up old data periodically
- Use metadata sparingly
- Increase staleness thresholds where possible
- Worker Logs:
npm run tail - Cloudflare Dashboard: View request metrics
- KV Usage: Check storage consumption
- Dashboard Health: Monitor your own worker!
- Heartbeat success rate
- KV read/write operations
- Worker execution time
- Error rates
Recently added capabilities:
- ✅ Multi-Channel Notifications: Discord, Slack, Telegram, Email, PagerDuty, Pushover, Webhook
- ✅ External Alert Integration: Grafana, Alertmanager, custom webhooks
- ✅ Real-time Dashboard Alerts: Toast and browser notifications
- ✅ Alert History: Searchable history with configurable retention
- ✅ Uptime Statistics: Daily uptime tracking with configurable retention (120 days)
- ✅ CSV Export: Historical data export with custom date ranges
- ✅ API Endpoint Controls: Enable/disable individual endpoints
- ✅ Customizable Alerts: Severity filtering, polling intervals
Potential improvements:
- Authentication: Add login to dashboard (currently supports Cloudflare Access)
- Charts: Visual graphs of uptime trends
- Multi-region Tracking: Identify which region/datacenter sent heartbeat
- Service Dependencies: Track and visualize service dependencies
- Custom Status Pages: Public-facing status page generation
- Synthetic Monitoring: Active checks in addition to heartbeats
- Performance Metrics: Track response times and custom metrics
| Feature | This Solution | Traditional Monitoring | Cloud Services |
|---|---|---|---|
| Cost | Free - $5/mo | $50-500/mo | $20-200/mo |
| Setup Time | 10 minutes | Hours/Days | 30 min - 2 hours |
| Exposure | None | Inbound required | Varies |
| Maintenance | Minimal | High | Low |
| Scalability | 100-1000 services | Unlimited | Unlimited |
| Customization | Full control | Limited | Limited |
This solution is ideal for:
✅ Internal services that shouldn't be exposed ✅ Small to medium deployments (< 100 services) ✅ Budget-conscious teams ✅ Simple uptime monitoring ✅ Teams comfortable with Cloudflare
Not ideal for:
❌ Complex health checks (use dedicated monitoring) ❌ Sub-second monitoring requirements ❌ 1000+ services (consider paid alternatives) ❌ Teams without Cloudflare experience
Check the main README.md or QUICKSTART.md for more details.