feat: add composite upload option for large file writes#1254
feat: add composite upload option for large file writes#1254mishushakov wants to merge 13 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 111dd1f The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
PR SummaryMedium Risk Overview Extends the envd API spec and generated types with Written by Cursor Bugbot for commit 111dd1f. This will update automatically on new commits. Configure here. |
Package ArtifactsBuilt from 72d3970. Download artifacts from this workflow run. JS SDK ( npm install ./e2b-2.19.1-mishushakov-composite-upload.0.tgzCLI ( npm install ./e2b-cli-2.9.1-mishushakov-composite-upload.0.tgzPython SDK ( pip install ./e2b-2.20.0+mishushakov.composite.upload-py3-none-any.whl |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
- Build chunk_paths deterministically before asyncio.gather in async _composite_write - Use Username type instead of bare string in JS compositeWrite Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use already-materialized blob/content instead of re-reading original data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When data fits in a single chunk, fall through to the normal write path instead of duplicating the upload logic inside compositeWrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the `composite` option from `write()`. Files over 64MB are now automatically chunked and uploaded via the composite path when the envd version supports it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When data is an IO object and ≤64MB, to_upload_body() consumes the stream. Pass the materialized bytes to write_files() instead of the exhausted IO object. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9de5bc1fe8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "destination": destination, | ||
| } | ||
| if username: | ||
| body["username"] = username | ||
|
|
||
| r = await self._envd_api.post( | ||
| ENVD_API_FILES_COMPOSE_ROUTE, | ||
| json=body, | ||
| timeout=self._connection_config.get_request_timeout(request_timeout), | ||
| ) | ||
|
|
||
| err = await _ahandle_filesystem_envd_api_exception(r) | ||
| if err: | ||
| raise err | ||
|
|
||
| return WriteInfo(**r.json()) | ||
|
|
||
| async def list( | ||
| self, | ||
| path: str, |
There was a problem hiding this comment.
🔴 The new _composite_write method introduces two related bugs in its _upload_chunk coroutines run via asyncio.gather: (1) when use_gzip=True, to_upload_body(chunk_data, use_gzip) calls gzip.compress(raw) synchronously—a CPU-bound operation that can take seconds on a 64MB chunk—fully blocking the asyncio event loop before the first await, freezing all other async operations during every chunk's compression; (2) asyncio.gather without return_exceptions=True or TaskGroup does not cancel remaining in-flight chunk uploads when one fails, so they continue uploading to /tmp despite the overall operation having already failed, wasting bandwidth and creating additional orphaned temp files. Fix (1) by using asyncio.to_thread(gzip.compress, chunk_data) and fix (2) by using asyncio.TaskGroup (Python 3.11+) or explicit task cancellation on error.
Extended reasoning...
Issue 1 — Synchronous gzip.compress blocks the event loop
In _composite_write (lines 418–437 of sandbox_async/filesystem/filesystem.py), the inner _upload_chunk(i) coroutine calls upload_content = to_upload_body(chunk_data, use_gzip) before its first await. The to_upload_body function (in sandbox/filesystem/filesystem.py) calls gzip.compress(raw) when use_gzip=True. gzip.compress is a pure CPU-bound operation with no await points—it holds the GIL and blocks the asyncio event loop for the entire duration of the compression, typically hundreds of milliseconds to several seconds for a 64MB chunk.
Why asyncio.gather does not help here
Python's asyncio is single-threaded. All coroutines run cooperatively on one thread, and a coroutine can only yield control at an await point. Because gzip.compress runs before the first await in _upload_chunk, it holds the event loop thread continuously. With asyncio.gather launching chunk_count coroutines, each coroutine blocks the event loop sequentially at its compression step. Concretely, for a 192MB file (3 chunks), the event loop is blocked during 3 sequential gzip.compress calls—potentially 3–6+ seconds total—preventing all other async operations in the application from progressing. The fix is await asyncio.to_thread(gzip.compress, chunk_data) to offload compression to a thread pool, releasing the event loop between chunks.
Step-by-step proof for Issue 1
Consider a 128MB upload with use_gzip=True, producing 2 chunks of 64MB each:
asyncio.gather(_upload_chunk(0), _upload_chunk(1))schedules both coroutines._upload_chunk(0)runs first, callsgzip.compress(chunk_data_0)— event loop blocked ~1–3s.- Control never yields to
_upload_chunk(1)until compression finishes. _upload_chunk(0)hitsawait self._envd_api.post(...)— now_upload_chunk(1)can start._upload_chunk(1)callsgzip.compress(chunk_data_1)— event loop blocked again ~1–3s.- Net result: compressions run serially (2–6s of event loop blockage), defeating the purpose of parallel uploads and freezing any other async tasks (e.g., heartbeats, other sandbox operations).
Issue 2 — asyncio.gather does not cancel in-flight tasks on failure
Per the Python docs: "If return_exceptions is False (default), the first raised exception is immediately propagated to the task that awaits on gather(). Other awaitables in the aws sequence won't be cancelled and will continue to run." In _composite_write, if _upload_chunk(0) raises (e.g., server error, network timeout, or disk full), asyncio.gather propagates the exception to the caller immediately, but _upload_chunk(1) through _upload_chunk(N-1) continue uploading data in the background. Since the overall _composite_write operation has already failed and the compose step will never run, these uploads produce orphaned /tmp/.e2b-upload-* files that waste network bandwidth and sandbox disk space. This compounds with any existing orphaned-file cleanup issue.
How to fix
For Issue 1: replace upload_content = to_upload_body(chunk_data, use_gzip) with upload_content = await asyncio.to_thread(gzip.compress, chunk_data) (and handle the non-gzip path inline), or refactor to_upload_body to have an async variant. For Issue 2: replace asyncio.gather with asyncio.TaskGroup (Python 3.11+), which automatically cancels all remaining tasks when one raises. Both fixes are confined to _composite_write in the async filesystem module and have no impact on the sync SDK.
Composite upload's primary benefit is parallel chunk uploading, which the sync SDK cannot leverage (sequential HTTP requests negate the performance advantage). Only the async Python SDK and JS SDK retain composite upload support via asyncio.gather() and Promise.all(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
write_files() already calls to_upload_body internally, so the pre-materialization in write() was unnecessary after removing the composite upload size check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace asyncio.gather with asyncio.TaskGroup for structured concurrency, and offload gzip compression to a thread to avoid blocking the event loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| throw composeErr | ||
| } | ||
|
|
||
| return composeRes.data as WriteInfo |
There was a problem hiding this comment.
Orphaned chunk files leak disk on upload failure
Medium Severity
Both compositeWrite (JS) and _composite_write (Python) upload chunks to /tmp/.e2b-upload-{uuid}-{i} but never clean them up if a chunk upload partially fails or if the compose step fails. Promise.all / asyncio.TaskGroup will reject on first error, leaving already-uploaded chunks orphaned on disk. For a 1GB file, that's ~1GB of wasted space per failed attempt, which could exhaust the sandbox's limited disk and cause subsequent NotEnoughDiskSpace errors. A try/finally block around the upload+compose steps that removes successfully uploaded chunks on error would prevent this.


Summary
write()transparently splits data into 64MB chunks, uploads them in parallel, then composes them server-side using zero-copy concatenation via the newPOST /files/composeendpoint/files/composeendpoint to the envd OpenAPI spec withComposeRequestschema (source_paths,destination,username) and regenerates JS SDK typesTest plan
gzip: trueon large file uploads🤖 Generated with Claude Code