|
| 1 | +# AWS Distro for OpenTelemetry (ADOT) Setup |
| 2 | + |
| 3 | +This deployment includes AWS Distro for OpenTelemetry (ADOT) Collector as a sidecar container for distributed tracing with AWS X-Ray and metrics collection with CloudWatch. |
| 4 | + |
| 5 | +## Architecture |
| 6 | + |
| 7 | +``` |
| 8 | +┌─────────────────────────────────────────────────┐ |
| 9 | +│ ECS Task │ |
| 10 | +│ │ |
| 11 | +│ ┌──────────────────┐ ┌─────────────────┐ │ |
| 12 | +│ │ │ │ │ │ |
| 13 | +│ │ Application │───▶│ ADOT Collector │ │ |
| 14 | +│ │ Container │ │ │ │ |
| 15 | +│ │ │ │ localhost:4317 │ │ |
| 16 | +│ └──────────────────┘ └────────┬────────┘ │ |
| 17 | +│ │ │ |
| 18 | +└────────────────────────────────────┼───────────┘ |
| 19 | + │ |
| 20 | + ┌──────────────┴───────────────┐ |
| 21 | + │ │ |
| 22 | + ▼ ▼ |
| 23 | + ┌───────────────┐ ┌─────────────────┐ |
| 24 | + │ AWS X-Ray │ │ CloudWatch │ |
| 25 | + │ (Traces) │ │ (Metrics/Logs) │ |
| 26 | + └───────────────┘ └─────────────────┘ |
| 27 | +``` |
| 28 | + |
| 29 | +## Components |
| 30 | + |
| 31 | +### Application Container (source-data-proxy) |
| 32 | + |
| 33 | +The main Rust application is instrumented with OpenTelemetry and sends traces to the ADOT collector via OTLP: |
| 34 | + |
| 35 | +- **Endpoint**: `http://localhost:4317` (OTLP gRPC) |
| 36 | +- **Protocol**: OpenTelemetry Protocol (OTLP) |
| 37 | +- **Context Propagation**: W3C TraceContext headers |
| 38 | +- **Sampling**: 10% (configurable via `OTEL_TRACE_SAMPLE_RATE`) |
| 39 | + |
| 40 | +### ADOT Collector Sidecar |
| 41 | + |
| 42 | +The ADOT collector runs as a sidecar container in the same ECS task: |
| 43 | + |
| 44 | +- **Image**: `public.ecr.aws/aws-observability/aws-otel-collector:latest` |
| 45 | +- **CPU**: 256 units (0.25 vCPU) |
| 46 | +- **Memory**: 512 MB |
| 47 | +- **Configuration**: Default ECS configuration (`/etc/ecs/ecs-default-config.yaml`) |
| 48 | + |
| 49 | +#### Receivers |
| 50 | + |
| 51 | +- **OTLP gRPC**: Port 4317 |
| 52 | +- **OTLP HTTP**: Port 4318 |
| 53 | + |
| 54 | +#### Exporters |
| 55 | + |
| 56 | +- **AWS X-Ray**: For distributed tracing |
| 57 | +- **CloudWatch EMF**: For metrics |
| 58 | + |
| 59 | +## Environment Variables |
| 60 | + |
| 61 | +### Application Container |
| 62 | + |
| 63 | +```bash |
| 64 | +# OpenTelemetry Configuration |
| 65 | +OTEL_SERVICE_NAME=source-data-proxy |
| 66 | +OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 |
| 67 | +OTEL_TRACE_SAMPLE_RATE=0.1 |
| 68 | +OTEL_TRACES_SAMPLER=parentbased_traceidratio |
| 69 | +RUST_LOG=info,source_data_proxy=debug |
| 70 | +TRACING_SKIP_PATHS=/ |
| 71 | +DEPLOYMENT_ENV=<stack-name> |
| 72 | + |
| 73 | +# Disable telemetry for local development (optional) |
| 74 | +OTEL_SDK_DISABLED=true |
| 75 | +``` |
| 76 | + |
| 77 | +### ADOT Collector |
| 78 | + |
| 79 | +```bash |
| 80 | +AWS_REGION=<region> |
| 81 | +``` |
| 82 | + |
| 83 | +## IAM Permissions |
| 84 | + |
| 85 | +The ECS task role is granted the following permissions: |
| 86 | + |
| 87 | +```json |
| 88 | +{ |
| 89 | + "Effect": "Allow", |
| 90 | + "Action": [ |
| 91 | + "xray:PutTraceSegments", |
| 92 | + "xray:PutTelemetryRecords", |
| 93 | + "cloudwatch:PutMetricData", |
| 94 | + "logs:PutLogEvents", |
| 95 | + "logs:CreateLogGroup", |
| 96 | + "logs:CreateLogStream", |
| 97 | + "logs:DescribeLogStreams", |
| 98 | + "logs:DescribeLogGroups" |
| 99 | + ], |
| 100 | + "Resource": "*" |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +## Viewing Traces |
| 105 | + |
| 106 | +### AWS X-Ray Console |
| 107 | + |
| 108 | +1. Navigate to AWS X-Ray Console |
| 109 | +2. Select "Service Map" to see the distributed trace topology |
| 110 | +3. Select "Traces" to view individual traces |
| 111 | +4. Filter by service name: `source-data-proxy` |
| 112 | + |
| 113 | +### CloudWatch Logs |
| 114 | + |
| 115 | +Application logs are written to: |
| 116 | +- **Log Group**: `/ecs/<stack-name>-proxy` |
| 117 | +- **Format**: JSON with OpenTelemetry context (trace_id, span_id) |
| 118 | + |
| 119 | +ADOT Collector logs are written to: |
| 120 | +- **Log Group**: `/ecs/<stack-name>-adot-collector` |
| 121 | + |
| 122 | +## Resource Usage |
| 123 | + |
| 124 | +### Default Configuration |
| 125 | + |
| 126 | +- **Application Container**: 4 vCPU, 12 GB RAM |
| 127 | +- **ADOT Collector**: 0.25 vCPU, 512 MB RAM |
| 128 | +- **Total per task**: 4.25 vCPU, 12.5 GB RAM |
| 129 | + |
| 130 | +## Customization |
| 131 | + |
| 132 | +To customize the ADOT collector configuration, modify the `AdotCollector` construct: |
| 133 | + |
| 134 | +```typescript |
| 135 | +new AdotCollector(this, "adot-collector", { |
| 136 | + taskDefinition: this.service.taskDefinition, |
| 137 | + cpu: 512, // Increase CPU |
| 138 | + memoryLimitMiB: 1024, // Increase memory |
| 139 | + logRetention: logs.RetentionDays.ONE_MONTH, // Longer retention |
| 140 | +}); |
| 141 | +``` |
| 142 | + |
| 143 | +## Cost Optimization |
| 144 | + |
| 145 | +### Sampling |
| 146 | + |
| 147 | +Adjust the sampling rate to control costs: |
| 148 | + |
| 149 | +```typescript |
| 150 | +environment: { |
| 151 | + OTEL_TRACE_SAMPLE_RATE: "0.01", // 1% sampling |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +### Health Check Filtering |
| 156 | + |
| 157 | +Health checks are automatically filtered from tracing via `TRACING_SKIP_PATHS=/` to reduce noise and cost. |
| 158 | + |
| 159 | +### ADOT Collector Resources |
| 160 | + |
| 161 | +The default allocation (0.25 vCPU, 512 MB) is suitable for moderate traffic. Monitor CloudWatch metrics and adjust if needed. |
| 162 | + |
| 163 | +## Troubleshooting |
| 164 | + |
| 165 | +### No Traces in X-Ray |
| 166 | + |
| 167 | +1. Check ADOT collector logs: `/ecs/<stack-name>-adot-collector` |
| 168 | +2. Verify IAM permissions on the task role |
| 169 | +3. Check application logs for initialization errors |
| 170 | +4. Verify OTLP endpoint is set to `http://localhost:4317` |
| 171 | + |
| 172 | +### High CPU/Memory Usage on ADOT Collector |
| 173 | + |
| 174 | +1. Check the number of spans being sent |
| 175 | +2. Consider reducing sampling rate |
| 176 | +3. Increase ADOT collector resources |
| 177 | +4. Review batch configuration in ADOT config |
| 178 | + |
| 179 | +### Local Development |
| 180 | + |
| 181 | +For local development without ADOT: |
| 182 | + |
| 183 | +```bash |
| 184 | +# Disable OpenTelemetry entirely |
| 185 | +OTEL_SDK_DISABLED=true cargo run |
| 186 | + |
| 187 | +# OR run a local OTLP collector |
| 188 | +docker run -p 4317:4317 -p 16686:16686 jaegertracing/all-in-one:latest |
| 189 | +cargo run |
| 190 | +``` |
| 191 | + |
| 192 | +## References |
| 193 | + |
| 194 | +- [AWS ADOT Documentation](https://aws-otel.github.io/) |
| 195 | +- [ADOT on ECS](https://aws-otel.github.io/docs/getting-started/ecs) |
| 196 | +- [AWS X-Ray Developer Guide](https://docs.aws.amazon.com/xray/latest/devguide/) |
| 197 | +- [OpenTelemetry Rust](https://opentelemetry.io/docs/instrumentation/rust/) |
0 commit comments