Skip to content

Production configuration and documentation #10

@ncolesummers

Description

@ncolesummers

Context

Final issue in the observability initiative. Prepares the observability stack for production use with proper configuration, resource limits, and comprehensive documentation.

Scope

Files to Create

  • compose.observability.yml — Separate compose file for production observability stack
    • Can be composed with compose.yml: docker compose -f compose.yml -f compose.observability.yml up
    • Production-grade settings: persistent storage, resource limits, health checks
    • Configurable external endpoints (for managed Jaeger/Grafana Cloud/etc.)

Files to Modify

  • docs/observability.md — Complete the documentation with:
    • Architecture overview with diagrams
    • Local development quickstart
    • How to access each tool (URLs, ports, credentials)
    • How to add custom spans (backend Python examples)
    • How to add custom metrics (Prometheus client examples)
    • Sampling configuration reference per environment
    • Troubleshooting guide (no traces, no metrics, no logs)
    • Production deployment considerations
  • compose.override.yml — Add health checks to all observability services
  • compose.override.yml — Add resource limits (memory, CPU) to prevent runaway usage
  • README.md — Add observability section with links to docs

Sampling Strategy Documentation

Environment Traces Metrics Logs Config
local 100% All DEBUG+ OTEL_SAMPLING_RATE=1.0
staging 10% All INFO+ OTEL_SAMPLING_RATE=0.1
production 1% All WARNING+ OTEL_SAMPLING_RATE=0.01

Health Checks for Observability Services

  • OTEL Collector: curl http://localhost:13133/ (health extension)
  • Jaeger: curl http://localhost:14269/ (admin health)
  • Prometheus: curl http://localhost:9090/-/healthy
  • Loki: curl http://localhost:3100/ready
  • Grafana: curl http://localhost:3000/api/health

Resource Limits

  • OTEL Collector: 512Mi memory, 0.5 CPU
  • Jaeger: 1Gi memory, 0.5 CPU
  • Prometheus: 1Gi memory, 0.5 CPU
  • Loki: 512Mi memory, 0.25 CPU
  • Grafana: 512Mi memory, 0.25 CPU

Acceptance Criteria

  • Documentation is clear, complete, and tested against actual running stack
  • Production compose file works when combined with base compose.yml
  • All observability services have health checks in Docker Compose
  • Resource limits prevent observability tools from consuming excessive resources
  • README updated with observability section
  • A new developer can follow the docs to understand the full observability setup

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationobservabilityObservability, tracing, metrics, logging

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions