Context
Final issue in the observability initiative. Prepares the observability stack for production use with proper configuration, resource limits, and comprehensive documentation.
Scope
Files to Create
Files to Modify
Sampling Strategy Documentation
| Environment |
Traces |
Metrics |
Logs |
Config |
local |
100% |
All |
DEBUG+ |
OTEL_SAMPLING_RATE=1.0 |
staging |
10% |
All |
INFO+ |
OTEL_SAMPLING_RATE=0.1 |
production |
1% |
All |
WARNING+ |
OTEL_SAMPLING_RATE=0.01 |
Health Checks for Observability Services
- OTEL Collector:
curl http://localhost:13133/ (health extension)
- Jaeger:
curl http://localhost:14269/ (admin health)
- Prometheus:
curl http://localhost:9090/-/healthy
- Loki:
curl http://localhost:3100/ready
- Grafana:
curl http://localhost:3000/api/health
Resource Limits
- OTEL Collector: 512Mi memory, 0.5 CPU
- Jaeger: 1Gi memory, 0.5 CPU
- Prometheus: 1Gi memory, 0.5 CPU
- Loki: 512Mi memory, 0.25 CPU
- Grafana: 512Mi memory, 0.25 CPU
Acceptance Criteria
Dependencies
Context
Final issue in the observability initiative. Prepares the observability stack for production use with proper configuration, resource limits, and comprehensive documentation.
Scope
Files to Create
compose.observability.yml— Separate compose file for production observability stackcompose.yml:docker compose -f compose.yml -f compose.observability.yml upFiles to Modify
docs/observability.md— Complete the documentation with:compose.override.yml— Add health checks to all observability servicescompose.override.yml— Add resource limits (memory, CPU) to prevent runaway usageREADME.md— Add observability section with links to docsSampling Strategy Documentation
localOTEL_SAMPLING_RATE=1.0stagingOTEL_SAMPLING_RATE=0.1productionOTEL_SAMPLING_RATE=0.01Health Checks for Observability Services
curl http://localhost:13133/(health extension)curl http://localhost:14269/(admin health)curl http://localhost:9090/-/healthycurl http://localhost:3100/readycurl http://localhost:3000/api/healthResource Limits
Acceptance Criteria
Dependencies