Skip to content

CRT S3 client hangs with concurrent AsyncRequestBody.fromPublisher() uploads in containerized environments #6724

@lthoulon-locala

Description

@lthoulon-locala

Describe the bug

We're experiencing indefinite hangs when initiating multiple concurrent S3 uploads using AsyncRequestBody.fromPublisher() with Project Reactor in Kubernetes environments with CPU limits. The application works fine locally but deadlocks in production.

We believe this may be related to the CRT event loop thread count being derived from availableProcessors(), which in containerized environments is often limited to 1-2. We wanted to report this in case it affects others or if there's a better pattern we should be using.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Both concurrent S3 uploads should complete successfully regardless of the CPU count or containerization.

Current Behavior

In environments with low availableProcessors() (specifically 1):

  • Application hangs indefinitely with no progress
  • Only one upload's publisher gets subscribed to by the CRT
  • The second upload never begins processing
  • No exceptions or timeouts occur — completely silent hang

Works correctly in environments with 3+ processor counts.

Reproduction Steps

CrtAutoConnectDeadlockReproduction.kt.zip

A minimal reproduction test is attached (CrtAutoConnectDeadlockReproduction.kt).

The pattern: Two concurrent S3 uploads using AsyncRequestBody.fromPublisher() where both publishers share a common source via publish().autoConnect(2).

Run the test with -XX:ActiveProcessorCount=1:

  • Result: Throws TimeoutException after 5 seconds (deadlock)

Uncommenting the EventLoopGroup.setStaticDefaultNumThreads(4) line in the test makes it pass.

Possible Solution

Calling EventLoopGroup.setStaticDefaultNumThreads(4) before creating the S3 client works around the issue (see commented line in the reproduction test).

We're not sure if this is the recommended approach or if there's a better pattern for concurrent reactive uploads with the CRT client in low-CPU environments.

Additional Information/Context

Related issues we found:

  • #656 — Similar symptom but with blocking I/O
  • #4305 — Buffer blocking (resolved), different from our case
  • #836 — Event loop thread discussion

In Kubernetes, CPU limits are commonly used for resource management. A pod with cpu: 2000m reports availableProcessors() = 2, leading to very few event loop threads.

We'd appreciate any guidance on the recommended pattern for this use case.

AWS Java SDK version used

2.41.23 + aws-crt 0.43.1

JDK version used

21

Operating System and version

Kubernetes pods with CPU limits (reproduces locally with -XX:ActiveProcessorCount=1 - macOs Tahoe 26.2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.needs-triageThis issue or PR still needs to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions