Skip to content

Commit b095d62

Browse files
committed
IN PROGRESS
1 parent 453b2a8 commit b095d62

File tree

16 files changed

+1712
-111
lines changed

16 files changed

+1712
-111
lines changed

Cargo.lock

Lines changed: 648 additions & 96 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ azure_storage = "0.20.0"
3939
azure_core = "0.20.0"
4040
time = { version = "0.3", features = ["formatting"] }
4141
url = "2.2.2"
42-
reqwest = { version = "0.11.0", features = ["stream", "json"] }
42+
reqwest = { version = "0.12", features = ["stream", "json"] }
4343
actix-cors = "0.7.0"
4444
moka = { version = "0.12.8", features = ["future"] }
4545
percent-encoding = "2.1.0"
@@ -50,5 +50,19 @@ actix-http = "^3"
5050
thiserror = "2.0.12"
5151
serde_json = "1.0.141"
5252

53+
# OpenTelemetry dependencies
54+
opentelemetry = "0.24"
55+
opentelemetry_sdk = { version = "0.24", features = ["rt-tokio"] }
56+
opentelemetry-otlp = { version = "0.17", features = ["grpc-tonic", "trace"] }
57+
opentelemetry-semantic-conventions = "0.16"
58+
tracing = "0.1"
59+
tracing-opentelemetry = "0.25"
60+
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
61+
tracing-actix-web = "0.7"
62+
opentelemetry-http = "0.13"
63+
reqwest-middleware = "0.4"
64+
reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_24"] }
65+
5366
[dev-dependencies]
5467
common-s3-headers = "1.0.0"
68+
temp-env = "0.3"

deploy/ADOT_SETUP.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# AWS Distro for OpenTelemetry (ADOT) Setup
2+
3+
This deployment includes AWS Distro for OpenTelemetry (ADOT) Collector as a sidecar container for distributed tracing with AWS X-Ray and metrics collection with CloudWatch.
4+
5+
## Architecture
6+
7+
```
8+
┌─────────────────────────────────────────────────┐
9+
│ ECS Task │
10+
│ │
11+
│ ┌──────────────────┐ ┌─────────────────┐ │
12+
│ │ │ │ │ │
13+
│ │ Application │───▶│ ADOT Collector │ │
14+
│ │ Container │ │ │ │
15+
│ │ │ │ localhost:4317 │ │
16+
│ └──────────────────┘ └────────┬────────┘ │
17+
│ │ │
18+
└────────────────────────────────────┼───────────┘
19+
20+
┌──────────────┴───────────────┐
21+
│ │
22+
▼ ▼
23+
┌───────────────┐ ┌─────────────────┐
24+
│ AWS X-Ray │ │ CloudWatch │
25+
│ (Traces) │ │ (Metrics/Logs) │
26+
└───────────────┘ └─────────────────┘
27+
```
28+
29+
## Components
30+
31+
### Application Container (source-data-proxy)
32+
33+
The main Rust application is instrumented with OpenTelemetry and sends traces to the ADOT collector via OTLP:
34+
35+
- **Endpoint**: `http://localhost:4317` (OTLP gRPC)
36+
- **Protocol**: OpenTelemetry Protocol (OTLP)
37+
- **Context Propagation**: W3C TraceContext headers
38+
- **Sampling**: 10% (configurable via `OTEL_TRACE_SAMPLE_RATE`)
39+
40+
### ADOT Collector Sidecar
41+
42+
The ADOT collector runs as a sidecar container in the same ECS task:
43+
44+
- **Image**: `public.ecr.aws/aws-observability/aws-otel-collector:latest`
45+
- **CPU**: 256 units (0.25 vCPU)
46+
- **Memory**: 512 MB
47+
- **Configuration**: Default ECS configuration (`/etc/ecs/ecs-default-config.yaml`)
48+
49+
#### Receivers
50+
51+
- **OTLP gRPC**: Port 4317
52+
- **OTLP HTTP**: Port 4318
53+
54+
#### Exporters
55+
56+
- **AWS X-Ray**: For distributed tracing
57+
- **CloudWatch EMF**: For metrics
58+
59+
## Environment Variables
60+
61+
### Application Container
62+
63+
```bash
64+
# OpenTelemetry Configuration
65+
OTEL_SERVICE_NAME=source-data-proxy
66+
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
67+
OTEL_TRACE_SAMPLE_RATE=0.1
68+
OTEL_TRACES_SAMPLER=parentbased_traceidratio
69+
RUST_LOG=info,source_data_proxy=debug
70+
TRACING_SKIP_PATHS=/
71+
DEPLOYMENT_ENV=<stack-name>
72+
73+
# Disable telemetry for local development (optional)
74+
OTEL_SDK_DISABLED=true
75+
```
76+
77+
### ADOT Collector
78+
79+
```bash
80+
AWS_REGION=<region>
81+
```
82+
83+
## IAM Permissions
84+
85+
The ECS task role is granted the following permissions:
86+
87+
```json
88+
{
89+
"Effect": "Allow",
90+
"Action": [
91+
"xray:PutTraceSegments",
92+
"xray:PutTelemetryRecords",
93+
"cloudwatch:PutMetricData",
94+
"logs:PutLogEvents",
95+
"logs:CreateLogGroup",
96+
"logs:CreateLogStream",
97+
"logs:DescribeLogStreams",
98+
"logs:DescribeLogGroups"
99+
],
100+
"Resource": "*"
101+
}
102+
```
103+
104+
## Viewing Traces
105+
106+
### AWS X-Ray Console
107+
108+
1. Navigate to AWS X-Ray Console
109+
2. Select "Service Map" to see the distributed trace topology
110+
3. Select "Traces" to view individual traces
111+
4. Filter by service name: `source-data-proxy`
112+
113+
### CloudWatch Logs
114+
115+
Application logs are written to:
116+
- **Log Group**: `/ecs/<stack-name>-proxy`
117+
- **Format**: JSON with OpenTelemetry context (trace_id, span_id)
118+
119+
ADOT Collector logs are written to:
120+
- **Log Group**: `/ecs/<stack-name>-adot-collector`
121+
122+
## Resource Usage
123+
124+
### Default Configuration
125+
126+
- **Application Container**: 4 vCPU, 12 GB RAM
127+
- **ADOT Collector**: 0.25 vCPU, 512 MB RAM
128+
- **Total per task**: 4.25 vCPU, 12.5 GB RAM
129+
130+
## Customization
131+
132+
To customize the ADOT collector configuration, modify the `AdotCollector` construct:
133+
134+
```typescript
135+
new AdotCollector(this, "adot-collector", {
136+
taskDefinition: this.service.taskDefinition,
137+
cpu: 512, // Increase CPU
138+
memoryLimitMiB: 1024, // Increase memory
139+
logRetention: logs.RetentionDays.ONE_MONTH, // Longer retention
140+
});
141+
```
142+
143+
## Cost Optimization
144+
145+
### Sampling
146+
147+
Adjust the sampling rate to control costs:
148+
149+
```typescript
150+
environment: {
151+
OTEL_TRACE_SAMPLE_RATE: "0.01", // 1% sampling
152+
}
153+
```
154+
155+
### Health Check Filtering
156+
157+
Health checks are automatically filtered from tracing via `TRACING_SKIP_PATHS=/` to reduce noise and cost.
158+
159+
### ADOT Collector Resources
160+
161+
The default allocation (0.25 vCPU, 512 MB) is suitable for moderate traffic. Monitor CloudWatch metrics and adjust if needed.
162+
163+
## Troubleshooting
164+
165+
### No Traces in X-Ray
166+
167+
1. Check ADOT collector logs: `/ecs/<stack-name>-adot-collector`
168+
2. Verify IAM permissions on the task role
169+
3. Check application logs for initialization errors
170+
4. Verify OTLP endpoint is set to `http://localhost:4317`
171+
172+
### High CPU/Memory Usage on ADOT Collector
173+
174+
1. Check the number of spans being sent
175+
2. Consider reducing sampling rate
176+
3. Increase ADOT collector resources
177+
4. Review batch configuration in ADOT config
178+
179+
### Local Development
180+
181+
For local development without ADOT:
182+
183+
```bash
184+
# Disable OpenTelemetry entirely
185+
OTEL_SDK_DISABLED=true cargo run
186+
187+
# OR run a local OTLP collector
188+
docker run -p 4317:4317 -p 16686:16686 jaegertracing/all-in-one:latest
189+
cargo run
190+
```
191+
192+
## References
193+
194+
- [AWS ADOT Documentation](https://aws-otel.github.io/)
195+
- [ADOT on ECS](https://aws-otel.github.io/docs/getting-started/ecs)
196+
- [AWS X-Ray Developer Guide](https://docs.aws.amazon.com/xray/latest/devguide/)
197+
- [OpenTelemetry Rust](https://opentelemetry.io/docs/instrumentation/rust/)

deploy/ECS_SETUP.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# ECS Fargate Setup with Application Load Balancer
2+
3+
This document describes the CDK infrastructure for the Source Data Proxy ECS Fargate service.
4+
5+
## Architecture
6+
7+
The setup includes:
8+
9+
1. **ECS Cluster**: Hosts the Fargate services
10+
2. **Application Load Balancer**: Routes traffic to the ECS service
11+
3. **ECS Fargate Service**: Runs the source-data-proxy container
12+
4. **Target Group**: Health checks and load balancing configuration
13+
5. **Security Groups**: Network access control
14+
15+
## Components
16+
17+
### EcsCluster (`ecs-cluster.ts`)
18+
- Creates an ECS cluster with container insights and service discovery
19+
- Enables ECS Exec for debugging
20+
- Configures CloudWatch logging
21+
22+
### SourceDataProxy (`source-data-proxy.ts`)
23+
- Uses the `ApplicationLoadBalancedFargateService` pattern for simplified setup
24+
- Automatically creates ALB, target group, and service configuration
25+
- Defines ECS task definition with Fargate configuration
26+
- Creates the ECS service with auto-scaling capabilities
27+
- Builds Docker image from source code using `ContainerImage.fromAsset()`
28+
29+
## Configuration
30+
31+
### Environment Variables
32+
- `TASK_ROLE_ARN`: IAM role for the ECS task
33+
- `EXECUTION_ROLE_ARN`: IAM role for ECS task execution
34+
35+
### Task Definition
36+
- **CPU**: 4 vCPU (4096 CPU units)
37+
- **Memory**: 12 GB (12288 MB)
38+
- **Architecture**: x86_64 Linux
39+
- **Network Mode**: awsvpc
40+
- **Port**: 8080 (HTTP)
41+
42+
### Load Balancer
43+
- **Type**: Application Load Balancer (automatically configured)
44+
- **Protocol**: HTTP (port 80)
45+
- **Health Check**: Default ALB health checks
46+
- **Target Type**: IP (for Fargate)
47+
48+
## Deployment
49+
50+
The service is deployed through CDK with the following workflow:
51+
52+
1. **CDK Deploy**: Infrastructure is deployed using CDK, which automatically builds the Docker image from source code
53+
2. **Service Update**: ECS service is updated with new task definition
54+
55+
## Security
56+
57+
- ECS service runs in private subnets
58+
- ALB is internet-facing but only forwards to private ECS tasks
59+
- Security groups restrict access appropriately
60+
- IAM roles provide least-privilege access
61+
62+
## Monitoring
63+
64+
- CloudWatch logs for container logs
65+
- Container insights for cluster monitoring
66+
- ALB access logs for traffic analysis
67+
- Health checks for service availability
68+
69+
## Scaling
70+
71+
- Service starts with 2 desired tasks for high availability
72+
- Circuit breaker enabled for automatic rollback on failures
73+
- Service discovery enabled for internal communication
74+
75+
## Integration
76+
77+
The service integrates with:
78+
- Vercel API proxy for external API access
79+
- Source API for data retrieval
80+
- CloudWatch for logging and monitoring

deploy/cdk.context.json

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
{
2+
"vpc-provider:account=417712557820:filter.vpc-id=vpc-05858c6e5697bbc40:region=us-east-1:returnAsymmetricSubnets=true": {
3+
"vpcId": "vpc-05858c6e5697bbc40",
4+
"vpcCidrBlock": "172.31.0.0/16",
5+
"ownerAccountId": "417712557820",
6+
"availabilityZones": [],
7+
"subnetGroups": [
8+
{
9+
"name": "Public",
10+
"type": "Public",
11+
"subnets": [
12+
{
13+
"subnetId": "subnet-0b58bb77fafa39150",
14+
"cidr": "172.31.0.0/20",
15+
"availabilityZone": "us-east-1a",
16+
"routeTableId": "rtb-027f8b153e98defc0"
17+
},
18+
{
19+
"subnetId": "subnet-05005bb04f849cc10",
20+
"cidr": "172.31.80.0/20",
21+
"availabilityZone": "us-east-1b",
22+
"routeTableId": "rtb-027f8b153e98defc0"
23+
},
24+
{
25+
"subnetId": "subnet-0946107edbccd3597",
26+
"cidr": "172.31.16.0/20",
27+
"availabilityZone": "us-east-1c",
28+
"routeTableId": "rtb-027f8b153e98defc0"
29+
},
30+
{
31+
"subnetId": "subnet-032270dfa5a352ff0",
32+
"cidr": "172.31.32.0/20",
33+
"availabilityZone": "us-east-1d",
34+
"routeTableId": "rtb-027f8b153e98defc0"
35+
},
36+
{
37+
"subnetId": "subnet-01f7a2705fd7610df",
38+
"cidr": "172.31.48.0/20",
39+
"availabilityZone": "us-east-1e",
40+
"routeTableId": "rtb-027f8b153e98defc0"
41+
},
42+
{
43+
"subnetId": "subnet-027a32b8b687b9419",
44+
"cidr": "172.31.64.0/20",
45+
"availabilityZone": "us-east-1f",
46+
"routeTableId": "rtb-027f8b153e98defc0"
47+
}
48+
]
49+
}
50+
]
51+
},
52+
"availability-zones:account=417712557820:region=us-east-1": [
53+
"us-east-1a",
54+
"us-east-1b",
55+
"us-east-1c",
56+
"us-east-1d",
57+
"us-east-1e",
58+
"us-east-1f"
59+
]
60+
}

0 commit comments

Comments
 (0)