feat(orchestrator): improve concurrent benchmark tracing and enable huge pages#2327
feat(orchestrator): improve concurrent benchmark tracing and enable huge pages#2327
Conversation
…rk spans
Add concurrency level, sandbox index, and sandbox ID attributes to the
bench-resume span so traces can be filtered by concurrency level in
Grafana/Tempo (e.g. {span.concurrency=5}).
Production uses huge pages, so the benchmark should too. Disable with DISABLE_HUGE_PAGES=true for comparison. Uses a separate build ID per mode to avoid cache collisions.
…ume-fc During concurrent sandbox creation, resume-fc blocks on several parallel waits before it can load the snapshot. These waits were previously invisible — only covered by point-in-time ReportEvent calls that do not capture duration. Adding duration spans makes them visible as bars in the Grafana waterfall view. This is important because wait-rootfs-path turned out to be the primary bottleneck, growing significantly as more sandboxes are created simultaneously.
PR SummaryLow Risk Overview Reviewed by Cursor Bugbot for commit 621d261. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1d48b0bc1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Use strconv.ParseBool so that common boolean env values like 1, TRUE, or True are accepted, not just the exact string "true".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b82e6e5f88
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Co-authored-by: Jakub Novák <jakub@e2b.dev>
While investigating concurrent sandbox creation performance, I found
that traces were missing key information — there was no way to filter
spans by concurrency level, and the biggest bottleneck inside resume-fc
was invisible in traces. I also enabled huge pages by default in the
benchmark to match production.
To fix the tracing gaps, I added
concurrencyandsandbox.indexattributes to benchmark spans for filtering in Grafana, and two new
spans in resume-fc (
wait-uffd-socketandwait-rootfs-path) thatmake the parallel waits before snapshot loading visible.
wait-rootfs-pathturned out to be the primary bottleneck, growingproportionally with concurrency due to kernel-level serialization in
nbdnl.Connect().