Skip to content

Conversation

@KCSesh
Copy link
Contributor

@KCSesh KCSesh commented Dec 5, 2025

Issue Related to aws/amazon-ecs-agent#4538

Code change dependency: bottlerocket-os/bottlerocket#4715

Description of changes:
Recent ecs-agent updates introduced an incompatibility when using FIPS ECR endpoints alongside use_fips_endpoint=true, requiring us to choose one approach.

We opted to let users specify FIPS ECR endpoints directly. However, amazon-ecr-containerd-resolver doesn't support FIPS endpoints without use_fips_endpoint=true, and the library has been tech debt we've wanted to migrate away from.

This change replaces amazon-ecr-containerd-resolver with containerd's Docker resolver.

Testing done:

ECS Agent Conformance Testing

Ran internal ECS conformance tests across multiple variants and architectures with use_fips_endpoint=false:

Variant Architecture Region
aws-ecs-2 x86_64 commercial
aws-ecs-2-nvidia-fips aarch64 commercial
aws-ecs-3 x86_64 commercial
aws-ecs-2 aarch64 us-gov-west-1
aws-ecs-2-nvidia-fips x86_64 us-gov-west-1

Additionally verified ECS task execution with both FIPS and non-FIPS containers to confirm expected behavior.

Host Container Image Pull Testing

GovCloud (us-gov-west-1)
Variant Image Endpoint FIPS ECR Auth Endpoint Result
aws-ecs-2 dkr.ecr false api.ecr.us-gov-west-1.amazonaws.com
aws-ecs-2 dkr.ecr-fips true api.ecr-fips.us-gov-west-1.amazonaws.com
aws-ecs-2-fips dkr.ecr false api.ecr.us-gov-west-1.amazonaws.com
aws-ecs-2-fips dkr.ecr-fips true api.ecr-fips.us-gov-west-1.amazonaws.com
China Region (cn-north-1)
Variant Image Endpoint FIPS Result
aws-ecs-2 dkr.ecr false
aws-ecs-2-fips dkr.ecr-fips true ❌ Expected failure: invalid FIPS region: cn-north-1
aws-ecs-2-fips dkr.ecr false
Special Region Test For New Regions (ap-southeast-7)

Verified host container image pull works in special region:

level=info msg="setting up ECR client" fips=false region=ap-southeast-7
level=info msg="pulling private ECR image" ref="<account>.dkr.ecr.ap-southeast-7.amazonaws.com/test-alpine:latest"
level=info msg="pulled image successfully"
Digest test:
Details
bash-5.1# journalctl -u host-containers@test
Dec 24 02:24:13 ip-172-31-24-3.us-west-2.compute.internal systemd[1]: Started Host container: test.
Dec 24 02:24:14 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:14Z" level=info msg="Image does not exist, proceeding to pull image from source." ref="111111111111.dkr.ecr.us-west-2.amazonaws.com/test-alpine@sha256:7b9b6a044d921dfcaea2a843ff19d725948590352198f93cb878fd2c19d7ba3c"
Dec 24 02:24:14 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:14Z" level=info msg="setting up ECR client" fips=false region=us-west-2
Dec 24 02:24:14 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:14Z" level=info msg="pulling private ECR image" ref="111111111111.dkr.ecr.us-west-2.amazonaws.com/test-alpine@sha256:7b9b6a044d921dfcaea2a843ff19d725948590352198f93cb878fd2c19d7ba3c" region=us-west-2
Dec 24 02:24:15 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:15Z" level=info msg="pulled image successfully" img="111111111111.dkr.ecr.us-west-2.amazonaws.com/test-alpine@sha256:7b9b6a044d921dfcaea2a843ff19d725948590352198f93cb878fd2c19d7ba3c"
Dec 24 02:24:15 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:15Z" level=info msg="unpacking image..." img="111111111111.dkr.ecr.us-west-2.amazonaws.com/test-alpine@sha256:7b9b6a044d921dfcaea2a843ff19d725948590352198f93cb878fd2c19d7ba3c"
Dec 24 02:24:15 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:15Z" level=info msg="Container does not exist, proceeding to create it" ctr-id=test
Dec 24 02:24:15 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:15Z" level=info msg="container task does not exist, proceeding to create it" container-id=test
Dec 24 02:24:15 ip-172-31-24-3.us-west-2.compute.internal host-containers@test[2236]: time="2025-12-24T02:24:15Z" level=info msg="successfully started container task"
bash-5.1#
**Terms of contribution:**

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@KCSesh KCSesh changed the title Use docker resolver [WIP] host-ctr: Use docker resolver Dec 5, 2025
@KCSesh KCSesh force-pushed the use-docker-resolver branch from e9a464e to 1270ef8 Compare December 5, 2025 08:12
@KCSesh KCSesh force-pushed the use-docker-resolver branch 7 times, most recently from f3e0301 to 0f5fa44 Compare December 24, 2025 00:57
@KCSesh KCSesh changed the title [WIP] host-ctr: Use docker resolver host-ctr: Use docker resolver Dec 24, 2025
@KCSesh KCSesh marked this pull request as ready for review December 24, 2025 01:59
})
authorizer := docker.NewDockerAuthorizer(authOpt)
c.Resolver = docker.NewResolver(docker.ResolverOptions{
// TODO: Consider adding support for user-provided credentials with registryConfig as fallback,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KCSesh KCSesh changed the title host-ctr: Use docker resolver host-ctr: use docker resolver Dec 24, 2025
@KCSesh KCSesh force-pushed the use-docker-resolver branch from 0f5fa44 to 48d1c7e Compare January 5, 2026 23:26
//
// Capture groups: [1] = account ID, [2] = "-fips" or empty, [3] = region
//
// ECR hostname pattern also used in the ecr-credential-provider:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've since deviated from this in order to support .eu domain suffix.

I think this regex predates your PR. It's a gnarly one to read. It would be great if we could reign it in or do away with it somehow. I know in Python regexes have "verbose" mode where you can add inline comments explaining parts of the regex.

I want to say I remember interacting with this one in the past, so there's a chance I tried to do battle with it and failed. Might be a dead end.

Copy link
Contributor Author

@KCSesh KCSesh Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to better match:
https://github.com/kubernetes/cloud-provider-aws/blob/d1c7c02d2da22e87175802ec94c73bd8871691bc/cmd/ecr-credential-provider/main.go#L46

which includes eu and also I added in-line comments.

Let me know what you think!

Comment on lines 43 to 45
// A set of the currently supported FIPS regions for ECR: https://docs.aws.amazon.com/general/latest/gr/ecr.html
// FIPS-supported ECR regions: https://docs.aws.amazon.com/general/latest/gr/ecr.html
var fipsSupportedEcrRegionSet = map[string]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dupe comment and the official list from the link appears to be larger now.

not nit: Is this something we can lean on the SDK for now? It seems like in the old code we needed to understand if we were doing FIPS to avoid hitting an error condition in the resolver.

Now we only use it to raise an error - but the SDK might take care of that for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dupe comment

Fixed.

Re: Not Not:

I did dig into this before the PR... and checked the SDK. U

Unfortunately it won't catch this for us. The SDK has a default FIPS hostname template (ecr-fips.{region}.amazonaws.com) that it falls back to for any region, so requesting FIPS for eu-west-1 (a non-fips region) would just construct ecr-fips.eu-west-1.amazonaws.com without error.

The failure would only happen later as a DNS/connection error, which would be confusing for users.
Keeping the validation gives a clear "invalid FIPS region" error upfront.

Also ECR doesn't appear to support all FIPS endpoints, which is another variable here:
https://aws.amazon.com/compliance/fips/ - see ca-west-1

the official list from the link appears to be larger now.

This is not quite true, I have updated the comment to be more clear. We current support the list of fips end points that ECR supports. There is not new FIPS regions for ECR.

@KCSesh KCSesh force-pushed the use-docker-resolver branch from 48d1c7e to 7dbbae0 Compare January 12, 2026 17:58
@KCSesh KCSesh force-pushed the use-docker-resolver branch from 7dbbae0 to 5869966 Compare January 12, 2026 18:04
Replace the amazon-ecr-containerd-resolver dependency with direct
implementation using containerd's Docker resolver.

Signed-off-by: Kyle Sessions <[email protected]>
@KCSesh KCSesh force-pushed the use-docker-resolver branch from 5869966 to 08a14e4 Compare January 12, 2026 18:10
Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants