Skip to content

refactor(orchestrator): move busybox from compile-time embed to runtime disk read#2326

Merged
tomassrnka merged 15 commits intomainfrom
busybox-runtime-read
Apr 9, 2026
Merged

refactor(orchestrator): move busybox from compile-time embed to runtime disk read#2326
tomassrnka merged 15 commits intomainfrom
busybox-runtime-read

Conversation

@tomassrnka
Copy link
Copy Markdown
Member

Summary

  • Moves busybox from //go:embed at compile time to os.ReadFile at runtime, matching how kernels, firecracker, and envd already work
  • Deletes the systeminit package that caused orchestrator-ee build failures in belt (the binary wasn't in the Go module cache because it's gitignored)
  • Creates a new fc-busybox GCS/S3 bucket per environment, mounted via gcsfuse (GCP) / s3fs (AWS) at /fc-busybox/{arch}/busybox
  • Belt requires zero changes — the problematic package no longer exists

Deployment steps (before merging)

  1. Upload busybox to GCS public bucket:
    cd packages/orchestrator
    make upload-busybox BUILD_ARCH=amd64
    make upload-busybox BUILD_ARCH=arm64
  2. Apply Terraform to create fc-busybox buckets:
    make plan && make apply
  3. Copy to per-environment buckets:
    make copy-public-builds

Test plan

  • Verify go build ./... compiles without busybox on disk (no embed dependency)
  • Verify go test ./... passes (rootfs_test mocks busybox with temp files)
  • Run make upload-busybox for both arches
  • Apply Terraform, verify fc-busybox bucket exists
  • Run make copy-public-builds, verify files in per-env bucket
  • Deploy orchestrator, verify template builds work (busybox read from /fc-busybox/{arch}/busybox)
  • Verify orchestrator-ee in belt compiles cleanly against updated infra module

🤖 Generated with Claude Code

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 8, 2026

PR Summary

High Risk
Medium-to-high operational risk: template builds now hard-depend on the presence and correct version/arch layout of busybox in /fc-busybox, and failures will surface at runtime instead of build time. Terraform and node startup changes add new buckets and IAM/GCS permissions that could break cluster boot or artifact access if misconfigured.

Overview
Moves the orchestrator’s busybox dependency from compile-time go:embed to a runtime file read, with new config/env support (BUSYBOX_VERSION, HOST_BUSYBOX_DIR) and updated template rootfs generation to inject busybox bytes from disk. Infrastructure and CI are updated to provision and mount a per-environment fc-busybox bucket (GCS/S3) at /fc-busybox, to copy public busybox builds into those buckets, and to adjust workflows/build tooling so tests and local builds fetch busybox from the public builds bucket instead of GitHub releases.

Reviewed by Cursor Bugbot for commit dee65be. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Fetch script version parameter ignored in download URL
    • Added version to GCS URL path in both fetch-busybox.sh and upload-busybox.sh so downloads are version-specific at busybox/${VERSION}/${ARCH}/busybox.
  • ✅ Fixed: AWS nodepool-client missing busybox ARN in IAM policy
    • Added fc_busybox_bucket_arn to the client_node_policy IAM document resources alongside the other bucket ARNs.

Create PR

Or push these changes by commenting:

@cursor push b31c321f9e
Preview (b31c321f9e)
diff --git a/iac/provider-aws/modules/nodepool-client/main.tf b/iac/provider-aws/modules/nodepool-client/main.tf
--- a/iac/provider-aws/modules/nodepool-client/main.tf
+++ b/iac/provider-aws/modules/nodepool-client/main.tf
@@ -66,6 +66,9 @@
 
       "${var.fc_versions_bucket_arn}/*",
       var.fc_versions_bucket_arn,
+
+      "${var.fc_busybox_bucket_arn}/*",
+      var.fc_busybox_bucket_arn,
     ]
   }
 

diff --git a/packages/orchestrator/scripts/fetch-busybox.sh b/packages/orchestrator/scripts/fetch-busybox.sh
--- a/packages/orchestrator/scripts/fetch-busybox.sh
+++ b/packages/orchestrator/scripts/fetch-busybox.sh
@@ -21,7 +21,7 @@
   exit 0
 fi
 
-GCS_URL="https://storage.googleapis.com/e2b-prod-public-builds/busybox/${ARCH}/busybox"
+GCS_URL="https://storage.googleapis.com/e2b-prod-public-builds/busybox/${VERSION}/${ARCH}/busybox"
 
 echo "Downloading busybox v${VERSION} (${ARCH}) from GCS..."
 

diff --git a/packages/orchestrator/scripts/upload-busybox.sh b/packages/orchestrator/scripts/upload-busybox.sh
--- a/packages/orchestrator/scripts/upload-busybox.sh
+++ b/packages/orchestrator/scripts/upload-busybox.sh
@@ -17,7 +17,7 @@
 
 RELEASE_URL="https://github.com/e2b-dev/fc-busybox/releases/download/v${VERSION}"
 BINARY="busybox_v${VERSION}_${ARCH}"
-GCS_PATH="gs://e2b-prod-public-builds/busybox/${ARCH}/busybox"
+GCS_PATH="gs://e2b-prod-public-builds/busybox/${VERSION}/${ARCH}/busybox"
 
 echo "Downloading busybox v${VERSION} (${ARCH}) from GitHub..."

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments.

@ValentaTomas ValentaTomas removed their assignment Apr 8, 2026
tomassrnka and others added 6 commits April 9, 2026 12:33
…me disk read

Busybox was embedded into the orchestrator binary via //go:embed at compile
time. This broke orchestrator-ee in the belt repo because the binary isn't
in the Go module cache (it's gitignored and fetched at build time).

Instead, busybox is now read from disk at runtime via HOST_BUSYBOX_DIR
(default /fc-busybox), matching how kernels, firecracker, and envd already
work. A new GCS bucket ({project}-fc-busybox) is created per environment
and mounted via gcsfuse (GCP) or s3fs (AWS) on orchestrator hosts.

Changes:
- Delete pkg/template/build/core/systeminit/ package entirely
- Read busybox from disk in rootfs.go using os.ReadFile
- Add HostBusyboxDir to BuilderConfig (HOST_BUSYBOX_DIR env var)
- Create fc-busybox bucket in Terraform (GCP + AWS)
- Add gcsfuse/s3fs mount in host startup scripts
- Add upload-busybox.sh for publishing to GCS public bucket
- Update fetch-busybox.sh to download from GCS instead of GitHub
- Add busybox to copy-public-builds and host-init
- Update cmd/create-build local mode to handle busybox path
- Remove fetch-busybox from CI lint/build (no longer compile-time dep)

Deployment: run upload-busybox, terraform apply, copy-public-builds before
deploying orchestrator binaries that include this change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Include version in GCS URL path (busybox/{version}/{arch}/busybox)
  so a version bump in the Makefile fails loudly if the binary hasn't
  been uploaded, instead of silently downloading the old one.
- Add BusyboxVersion to BuilderConfig (BUSYBOX_VERSION env, default 1.36.1)
  so the runtime path includes the version component.
- Add fc_busybox_bucket_arn to AWS nodepool-client IAM read-access policy
  (was missing, only worked because the parent cluster policy included it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upload a busybox.sha256 sidecar file alongside the binary in GCS.
The fetch script now downloads and verifies the checksum, restoring
the integrity check that was lost when moving from GitHub releases
(which had SHA256SUMS) to GCS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Busybox is no longer embedded at compile time, so tests don't need the
binary on disk. The rootfs_test mocks busybox with temp files. Removing
the fetch step unblocks CI before busybox is uploaded to GCS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The smoketest (cmd/smoketest) does a real template build that needs
busybox on disk. Restore fetch-busybox in test workflows with || true
so CI doesn't fail when GCS isn't populated yet (the smoketest will
fail with a clear error instead of the setup step crashing).

Also make host-init busybox download non-fatal for integration tests.

Set HOST_BUSYBOX_DIR via GITHUB_ENV so the test steps pick up the
local .busybox directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Host-init should fail hard if busybox isn't available, same as kernels
and firecracker. Upload busybox to GCS before deploying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tomassrnka tomassrnka force-pushed the busybox-runtime-read branch from 9faf869 to f05c47f Compare April 9, 2026 10:33
tomassrnka and others added 3 commits April 9, 2026 12:39
Co-authored-by: Jakub Novák <jakub@e2b.dev>
Co-authored-by: Jakub Novák <jakub@e2b.dev>
The fc-busybox CI workflow handles uploads to GCS automatically on
tag push. No need for a manual upload target in infra.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tomassrnka tomassrnka marked this pull request as ready for review April 9, 2026 11:18
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 416b1a5698

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The build-template target runs locally where /fc-busybox doesn't exist.
Fetch busybox first and point HOST_BUSYBOX_DIR to the local .busybox dir
so cmd/create-build can find the binary at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +172 to +177
os.Setenv("HOST_BUSYBOX_DIR", busyboxDir)
}
busyboxVersion := os.Getenv("BUSYBOX_VERSION")
if busyboxVersion == "" {
busyboxVersion = "1.36.1"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In setupEnv() (create-build/main.go:176), the busybox version fallback is hardcoded to "1.36.1" independently of cfg/model.go's envDefault:"1.36.1", creating two sources of truth. If the canonical default in cfg/model.go is ever bumped without updating this hardcoded fallback, the local-dev diagnostic will check the wrong version path and print a misleading warning even when busybox is present at the correct new-version path. The fix is to derive the version from cfg.ParseBuilder() instead of repeating the literal.

Extended reasoning...

What the bug is

In packages/orchestrator/cmd/create-build/main.go lines 172-177, setupEnv() reads BUSYBOX_VERSION from the environment and falls back to the hardcoded string "1.36.1" when the variable is unset. Separately, cfg/model.go declares BusyboxVersion with envDefault:"1.36.1". These two defaults are independent; nothing enforces they stay in sync.

The specific code path

When a developer runs create-build locally without setting BUSYBOX_VERSION, setupEnv() constructs busyboxBin using its own hardcoded "1.36.1". It then performs an os.Stat() check and prints either a success or warning message. After setupEnv() returns, cfg.ParseBuilder() is called and produces a BuilderConfig whose BusyboxVersion comes from the env tag default in model.go. The actual build uses BuilderConfig.BusyboxVersion, not the fallback in setupEnv().

Why existing code does not prevent it

There is no compile-time linkage between the two defaults. In a bare go run invocation without environment variables, both sources are read independently.

Impact

If cfg/model.go is updated to e.g. envDefault:"1.37.0" and create-build/main.go is not, the diagnostic check would stat .busybox/1.36.1/amd64/busybox and print a false warning, even though busybox is present at .busybox/1.37.0/amd64/busybox and the build itself succeeds. The impact is misleading developer output only (local dev mode diagnostic); production builds are unaffected because BUSYBOX_VERSION is set explicitly in production environments.

How to fix

Move cfg.ParseBuilder() before setupEnv() and pass builderConfig.BusyboxVersion into setupEnv() as a parameter, eliminating the second hardcoded fallback. Alternatively, expose the default as an exported constant from the cfg package and reference it in both places.

Step-by-step proof

  1. Developer bumps BusyboxVersion envDefault in cfg/model.go from "1.36.1" to "1.37.0" and runs make fetch-busybox, which downloads to .busybox/1.37.0/amd64/busybox.
  2. Developer runs go run ./cmd/create-build -storage ./local-build -to-build test without setting BUSYBOX_VERSION.
  3. setupEnv() reads os.Getenv("BUSYBOX_VERSION") which is empty and falls back to hardcoded "1.36.1".
  4. busyboxBin is set to .busybox/1.36.1/amd64/busybox which does not exist, prints the warning message.
  5. cfg.ParseBuilder() then returns BusyboxVersion="1.37.0" from the updated envDefault.
  6. rootfs.additionalOCILayers reads from .busybox/1.37.0/amd64/busybox and succeeds silently.
  7. Result: build succeeds but developer saw a false warning that busybox is missing.

tomassrnka and others added 2 commits April 9, 2026 13:37
Use cfg.DefaultBusyboxVersion instead of hardcoding "1.36.1" in
cmd/create-build, avoiding two sources of truth. Same pattern as
featureflags.DefaultKernelVersion and DefaultFirecrackerVersion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous version wrote the binary to the final path before
downloading the SHA256 sidecar. If the sidecar download failed,
set -e left an unverified binary on disk without a stamp. Now both
files go to temp paths first, verification runs, then the binary
is moved to the output path — matching the old /tmp staging pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tomassrnka and others added 2 commits April 9, 2026 13:53
Consistent with how kernels and firecracker are handled — the Makefile
doesn't set HOST_KERNELS_DIR or pre-fetch those either. The cmd/create-build
setupEnv() handles local paths when -storage is used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jakubno jakubno assigned jakubno and unassigned dobrac Apr 9, 2026
Keep fetch-busybox dependency but don't override the env var —
consistent with how other host paths are handled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit dee65be. Configure here.

@tomassrnka tomassrnka merged commit bba0313 into main Apr 9, 2026
44 checks passed
@tomassrnka tomassrnka deleted the busybox-runtime-read branch April 9, 2026 12:14
arkamar added a commit that referenced this pull request Apr 15, 2026
Commit bba0313 moved busybox from compile-time go:embed to runtime
disk read via HOST_BUSYBOX_DIR, but didn't add the variable to
.env.local. Without it, local-dev builds fail trying to read
busybox from /fc-busybox (the production default).

Point HOST_BUSYBOX_DIR at .busybox, the local fetch-busybox output
directory.

Fixes: bba0313 ("refactor(orchestrator): move busybox from compile-time embed to runtime disk read (#2326)")
arkamar added a commit that referenced this pull request Apr 15, 2026
Commit bba0313 moved busybox from compile-time go:embed to runtime
disk read via HOST_BUSYBOX_DIR, but didn't add the variable to
.env.local. Without it, local-dev builds fail trying to read
busybox from /fc-busybox (the production default).

Point HOST_BUSYBOX_DIR at .busybox, the local fetch-busybox output
directory.

Fixes: bba0313 ("refactor(orchestrator): move busybox from compile-time embed to runtime disk read (#2326)")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants