-
Notifications
You must be signed in to change notification settings - Fork 977
Description
Describe the bug
We're experiencing indefinite hangs when initiating multiple concurrent S3 uploads using AsyncRequestBody.fromPublisher() with Project Reactor in Kubernetes environments with CPU limits. The application works fine locally but deadlocks in production.
We believe this may be related to the CRT event loop thread count being derived from availableProcessors(), which in containerized environments is often limited to 1-2. We wanted to report this in case it affects others or if there's a better pattern we should be using.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Both concurrent S3 uploads should complete successfully regardless of the CPU count or containerization.
Current Behavior
In environments with low availableProcessors() (specifically 1):
- Application hangs indefinitely with no progress
- Only one upload's publisher gets subscribed to by the CRT
- The second upload never begins processing
- No exceptions or timeouts occur — completely silent hang
Works correctly in environments with 3+ processor counts.
Reproduction Steps
CrtAutoConnectDeadlockReproduction.kt.zip
A minimal reproduction test is attached (CrtAutoConnectDeadlockReproduction.kt).
The pattern: Two concurrent S3 uploads using AsyncRequestBody.fromPublisher() where both publishers share a common source via publish().autoConnect(2).
Run the test with -XX:ActiveProcessorCount=1:
- Result: Throws
TimeoutExceptionafter 5 seconds (deadlock)
Uncommenting the EventLoopGroup.setStaticDefaultNumThreads(4) line in the test makes it pass.
Possible Solution
Calling EventLoopGroup.setStaticDefaultNumThreads(4) before creating the S3 client works around the issue (see commented line in the reproduction test).
We're not sure if this is the recommended approach or if there's a better pattern for concurrent reactive uploads with the CRT client in low-CPU environments.
Additional Information/Context
Related issues we found:
- #656 — Similar symptom but with blocking I/O
- #4305 — Buffer blocking (resolved), different from our case
- #836 — Event loop thread discussion
In Kubernetes, CPU limits are commonly used for resource management. A pod with cpu: 2000m reports availableProcessors() = 2, leading to very few event loop threads.
We'd appreciate any guidance on the recommended pattern for this use case.
AWS Java SDK version used
2.41.23 + aws-crt 0.43.1
JDK version used
21
Operating System and version
Kubernetes pods with CPU limits (reproduces locally with -XX:ActiveProcessorCount=1 - macOs Tahoe 26.2)