feat: add composite upload option for large file writes by mishushakov · Pull Request #1254 · e2b-dev/E2B

mishushakov · 2026-04-03T10:22:59Z

Summary

Adds automatic composite upload for large files (>64MB) in both JS and Python SDKs — write() transparently splits data into 64MB chunks, uploads them in parallel, then composes them server-side using zero-copy concatenation via the new POST /files/compose endpoint
Adds /files/compose endpoint to the envd OpenAPI spec with ComposeRequest schema (source_paths, destination, username) and regenerates JS SDK types
Files at or below 64MB use the normal single upload path — no user-facing API changes needed
Supports gzip compression for chunk uploads
Supports both sync and async Python SDKs

Test plan

Upload a file >64MB and verify it is chunked and composed correctly
Upload a file <64MB and verify normal upload path is used
Test with gzip: true on large file uploads
Test with both JS and Python SDKs (sync and async)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

changeset-bot · 2026-04-03T10:23:05Z

🦋 Changeset detected

Latest commit: 111dd1f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
@e2b/python-sdk	Minor
e2b	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

cursor · 2026-04-03T10:23:06Z

PR Summary

Medium Risk
Changes the file upload path for large writes by introducing parallel chunk uploads and a server-side compose step, which could affect reliability/timeouts and temporary file handling. Risk is limited to SDK write flows and gated by envd version support.

Overview
Adds composite uploads for large file writes (>64MB). The JS SDK Filesystem.write() now transparently converts large writes to a 64MB-chunked parallel upload to temporary /tmp/.e2b-upload-* paths, then calls the new POST /files/compose endpoint to assemble the final file (with optional gzip).

Extends the envd API spec and generated types with POST /files/compose and the ComposeRequest schema (source_paths, destination, optional username), and updates the async Python SDK to use the same chunk+compose strategy for oversized writes. A changeset bumps @e2b/python-sdk and e2b as minor releases.

^{Written by Cursor Bugbot for commit 111dd1f. This will update automatically on new commits. Configure here.}

github-actions · 2026-04-03T10:23:57Z

Package Artifacts

Built from 72d3970. Download artifacts from this workflow run.

JS SDK (e2b@2.19.1-mishushakov-composite-upload.0):

npm install ./e2b-2.19.1-mishushakov-composite-upload.0.tgz

CLI (@e2b/cli@2.9.1-mishushakov-composite-upload.0):

npm install ./e2b-cli-2.9.1-mishushakov-composite-upload.0.tgz

Python SDK (e2b==2.20.0+mishushakov-composite-upload):

pip install ./e2b-2.20.0+mishushakov.composite.upload-py3-none-any.whl

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py

packages/js-sdk/src/sandbox/filesystem/index.ts

- Build chunk_paths deterministically before asyncio.gather in async _composite_write - Use Username type instead of bare string in JS compositeWrite Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py

packages/js-sdk/src/sandbox/filesystem/index.ts

Use already-materialized blob/content instead of re-reading original data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When data fits in a single chunk, fall through to the normal write path instead of duplicating the upload logic inside compositeWrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove the `composite` option from `write()`. Files over 64MB are now automatically chunked and uploaded via the composite path when the envd version supports it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py

When data is an IO object and ≤64MB, to_upload_body() consumes the stream. Pass the materialized bytes to write_files() instead of the exhausted IO object. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9de5bc1fe8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

packages/js-sdk/src/sandbox/filesystem/index.ts

claude · 2026-04-03T13:49:21Z

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py

+            "destination": destination,
+        }
+        if username:
+            body["username"] = username
+
+        r = await self._envd_api.post(
+            ENVD_API_FILES_COMPOSE_ROUTE,
+            json=body,
+            timeout=self._connection_config.get_request_timeout(request_timeout),
+        )
+
+        err = await _ahandle_filesystem_envd_api_exception(r)
+        if err:
+            raise err
+
+        return WriteInfo(**r.json())
+
    async def list(
        self,
        path: str,


🔴 The new _composite_write method introduces two related bugs in its _upload_chunk coroutines run via asyncio.gather: (1) when use_gzip=True, to_upload_body(chunk_data, use_gzip) calls gzip.compress(raw) synchronously—a CPU-bound operation that can take seconds on a 64MB chunk—fully blocking the asyncio event loop before the first await, freezing all other async operations during every chunk's compression; (2) asyncio.gather without return_exceptions=True or TaskGroup does not cancel remaining in-flight chunk uploads when one fails, so they continue uploading to /tmp despite the overall operation having already failed, wasting bandwidth and creating additional orphaned temp files. Fix (1) by using asyncio.to_thread(gzip.compress, chunk_data) and fix (2) by using asyncio.TaskGroup (Python 3.11+) or explicit task cancellation on error.

Extended reasoning...

Issue 1 — Synchronous gzip.compress blocks the event loop

In _composite_write (lines 418–437 of sandbox_async/filesystem/filesystem.py), the inner _upload_chunk(i) coroutine calls upload_content = to_upload_body(chunk_data, use_gzip) before its first await. The to_upload_body function (in sandbox/filesystem/filesystem.py) calls gzip.compress(raw) when use_gzip=True. gzip.compress is a pure CPU-bound operation with no await points—it holds the GIL and blocks the asyncio event loop for the entire duration of the compression, typically hundreds of milliseconds to several seconds for a 64MB chunk.

Why asyncio.gather does not help here

Python's asyncio is single-threaded. All coroutines run cooperatively on one thread, and a coroutine can only yield control at an await point. Because gzip.compress runs before the first await in _upload_chunk, it holds the event loop thread continuously. With asyncio.gather launching chunk_count coroutines, each coroutine blocks the event loop sequentially at its compression step. Concretely, for a 192MB file (3 chunks), the event loop is blocked during 3 sequential gzip.compress calls—potentially 3–6+ seconds total—preventing all other async operations in the application from progressing. The fix is await asyncio.to_thread(gzip.compress, chunk_data) to offload compression to a thread pool, releasing the event loop between chunks.

Step-by-step proof for Issue 1

Consider a 128MB upload with use_gzip=True, producing 2 chunks of 64MB each:

asyncio.gather(_upload_chunk(0), _upload_chunk(1)) schedules both coroutines.

_upload_chunk(0) runs first, calls gzip.compress(chunk_data_0) — event loop blocked ~1–3s.

Control never yields to _upload_chunk(1) until compression finishes.

_upload_chunk(0) hits await self._envd_api.post(...) — now _upload_chunk(1) can start.

_upload_chunk(1) calls gzip.compress(chunk_data_1) — event loop blocked again ~1–3s.

Net result: compressions run serially (2–6s of event loop blockage), defeating the purpose of parallel uploads and freezing any other async tasks (e.g., heartbeats, other sandbox operations).

Issue 2 — asyncio.gather does not cancel in-flight tasks on failure

Per the Python docs: "If return_exceptions is False (default), the first raised exception is immediately propagated to the task that awaits on gather(). Other awaitables in the aws sequence won't be cancelled and will continue to run." In _composite_write, if _upload_chunk(0) raises (e.g., server error, network timeout, or disk full), asyncio.gather propagates the exception to the caller immediately, but _upload_chunk(1) through _upload_chunk(N-1) continue uploading data in the background. Since the overall _composite_write operation has already failed and the compose step will never run, these uploads produce orphaned /tmp/.e2b-upload-* files that waste network bandwidth and sandbox disk space. This compounds with any existing orphaned-file cleanup issue.

How to fix

For Issue 1: replace upload_content = to_upload_body(chunk_data, use_gzip) with upload_content = await asyncio.to_thread(gzip.compress, chunk_data) (and handle the non-gzip path inline), or refactor to_upload_body to have an async variant. For Issue 2: replace asyncio.gather with asyncio.TaskGroup (Python 3.11+), which automatically cancels all remaining tasks when one raises. Both fixes are confined to _composite_write in the async filesystem module and have no impact on the sync SDK.

packages/js-sdk/src/sandbox/filesystem/index.ts

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py

Composite upload's primary benefit is parallel chunk uploading, which the sync SDK cannot leverage (sequential HTTP requests negate the performance advantage). Only the async Python SDK and JS SDK retain composite upload support via asyncio.gather() and Promise.all(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

write_files() already calls to_upload_body internally, so the pre-materialization in write() was unnecessary after removing the composite upload size check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace asyncio.gather with asyncio.TaskGroup for structured concurrency, and offload gzip compression to a thread to avoid blocking the event loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T14:12:36Z

packages/js-sdk/src/sandbox/filesystem/index.ts

+      throw composeErr
+    }
+
+    return composeRes.data as WriteInfo


Orphaned chunk files leak disk on upload failure

Medium Severity

Both compositeWrite (JS) and _composite_write (Python) upload chunks to /tmp/.e2b-upload-{uuid}-{i} but never clean them up if a chunk upload partially fails or if the compose step fails. Promise.all / asyncio.TaskGroup will reject on first error, leaving already-uploaded chunks orphaned on disk. For a 1GB file, that's ~1GB of wasted space per failed attempt, which could exhaust the sandbox's limited disk and cause subsequent NotEnoughDiskSpace errors. A try/finally block around the upload+compose steps that removes successfully uploaded chunks on error would prevent this.

Additional Locations (1)

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py#L416-L439

feat: add composite upload option for large file writes

71742bd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Apr 3, 2026

View reviewed changes

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py Outdated Show resolved Hide resolved

packages/js-sdk/src/sandbox/filesystem/index.ts Outdated Show resolved Hide resolved

fix: address PR review comments

c733a78

- Build chunk_paths deterministically before asyncio.gather in async _composite_write - Use Username type instead of bare string in JS compositeWrite Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Apr 3, 2026

View reviewed changes

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py Outdated Show resolved Hide resolved

feat: support gzip in composite upload, increase chunk size to 64MB

fc52b20

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Apr 3, 2026

View reviewed changes

packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py Outdated Show resolved Hide resolved

packages/js-sdk/src/sandbox/filesystem/index.ts Outdated Show resolved Hide resolved

mishushakov and others added 5 commits April 3, 2026 13:01

fix: avoid double-consuming streams/IO in composite single-chunk path

174b9b8

Use already-materialized blob/content instead of re-reading original data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: simplify composite upload for single-chunk files

a144791

When data fits in a single chunk, fall through to the normal write path instead of duplicating the upload logic inside compositeWrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: auto-detect large files for composite upload

f59ec5d

Remove the `composite` option from `write()`. Files over 64MB are now automatically chunked and uploaded via the composite path when the envd version supports it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: reset index.ts to main

afd10e6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: restore index.ts exports to match main branch

12ebc88

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Apr 3, 2026

View reviewed changes

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py Outdated Show resolved Hide resolved

fix: avoid consuming IO streams twice when data fits in single chunk

9de5bc1

When data is an IO object and ≤64MB, to_upload_body() consumes the stream. Pass the materialized bytes to write_files() instead of the exhausted IO object. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mishushakov marked this pull request as ready for review April 3, 2026 13:36

mishushakov requested review from ValentaTomas and jakubno as code owners April 3, 2026 13:36

chore: add changeset for composite upload

197bc3e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 3, 2026

View reviewed changes

packages/js-sdk/src/sandbox/filesystem/index.ts Show resolved Hide resolved

packages/js-sdk/src/sandbox/filesystem/index.ts Show resolved Hide resolved

claude bot reviewed Apr 3, 2026

View reviewed changes

cursor bot reviewed Apr 3, 2026

View reviewed changes

packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py Outdated Show resolved Hide resolved

mishushakov and others added 3 commits April 3, 2026 15:59

Remove redundant to_upload_body pre-materialization in sync write()

0512658

write_files() already calls to_upload_body internally, so the pre-materialization in write() was unnecessary after removing the composite upload size check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use TaskGroup and async gzip in async composite upload

111dd1f

Replace asyncio.gather with asyncio.TaskGroup for structured concurrency, and offload gzip compression to a thread to avoid blocking the event loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Apr 3, 2026

View reviewed changes

Conversation

mishushakov commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

changeset-bot bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

cursor bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Package Artifacts

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

claude bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Orphaned chunk files leak disk on upload failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mishushakov commented Apr 3, 2026 •

edited

Loading

changeset-bot bot commented Apr 3, 2026 •

edited

Loading

cursor bot commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading