[support bundle] Collect sled-specific bundle information concurrently #8106

smklein · 2025-05-06T21:26:42Z

In the interest of improving support bundle collection times, avoid bottlenecking on each sled individually.

Instead, collect from up to 16 sleds at once.

…vironment

hawkw

sorry for the obnoxious drive-by :)

hawkw · 2025-05-06T23:28:03Z

nexus/src/app/background/tasks/support_bundle_collector.rs

+            let mut sled_stream = futures::stream::iter(all_sleds)
+                .map(|sled| async move {
+                    self.collect_data_from_sled(&log, &sled, dir).await
+                })


hm, I note that while the collect_data_from_sled calls will execute concurrently, the async fns will not actually run in parallel, since they're not spawned in separate tasks. so there will only ever be one thread executing one of those futures at a time, although we will be running them concurrently and executing the next one when one yields. in this case, i think that's mostly okay, since it looks like most of the runtime of collect_data_from_sled is probably waiting on IO (waiting for the HTTP requests to come back and the tokio::fs operations, which run on the blocking threadpool, to complete). but, i might nonetheless consider spawning tasks for each sled, instead. up to you.

(if you decide these should run in parallel i might consider using tokio::task::JoinSet rather than buffer_unordered, since it uses the task allocation to store the task's membership in the joinset and therefore uses much less heap. may not actually matter though.)

That's super interesting, thanks for sharing that bit of info on buffer_unordered vs JoinSet.

I can make this refactor -- I'm aware that it was concurrent, not parallel -- but the 'static bound on the spawn-ed tasks means I probably can't operate on &self, and will need to refactor this into a free function.

Made these changes in 59e8750 - I'm using some manual counting to tasks to respect my previous "maximum bound on current activity".

hawkw · 2025-05-06T23:28:56Z

nexus/src/app/background/tasks/support_bundle_collector.rs

+        // Currently we execute up to 10 commands concurrently which
+        // might be doing their own concurrent work, for example
+        // collectiong `pstack` output of every Oxide process that is
+        // found on a sled.
+        .buffer_unordered(10);


again, concurrent but not parallel. whether that matters depends on how much CPU-bound work these functions do...

Above you mentioned fs operations being spawned onto the blocking thread pool -- does that mean futures_unordered can preform those tasks concurrently still since the await point is waiting for the result from the blocking thread typically? I am curious because the the log collection is a synchronous task which we use spawn_blocking for ourselves.

papertigers · 2025-05-14T22:02:13Z

nexus/src/app/background/tasks/support_bundle_collector.rs

-                    continue;
+            // While we have incoming work to send to tasks (sleds_iter)
+            // or a task operating on that data (tasks)...
+            while sleds_iter.peek().is_some() || !tasks.is_empty() {


nit: You could probably swap this to use a semaphore but I am not pressed about it.

I'll look into refactoring this behavior out as a follow-up

smklein added 30 commits April 11, 2025 10:33

[nexus] Put support bundles in internal API too

07dd473

[omdb] Basic commands to access support bundles

c079c3f

Updated output

df47341

Merge branch 'main' into sb-internal-api

219d284

Merge branch 'sb-internal-api' into omdb-sb

e39785a

[nexus] Make it 'more default' for Debug datasets to exist in test en…

12f461b

…vironment

test patching

1af91c6

Merge branch 'main' into sb-internal-api

8969fbe

Merge branch 'sb-internal-api' into omdb-sb

3dfd8ab

Merge branch 'omicron-dev-disk-test' into sb-internal-api

07de40c

Merge branch 'sb-internal-api' into omdb-sb

4bf9d9a

Try compiling

73b4975

Merge branch 'omicron-dev-disk-test' into sb-internal-api

c3876e2

Merge branch 'sb-internal-api' into omdb-sb

600a537

Use internal opctx

8022e12

Patching tests more

47819c0

Merge branch 'omicron-dev-disk-test' into sb-internal-api

78af872

Merge branch 'sb-internal-api' into omdb-sb

192e255

Don't inject newlines

6374a7b

Continuing to iterate on TUI

c28a398

Merge branch 'main' into omicron-dev-disk-test

ab14729

Merge branch 'omicron-dev-disk-test' into sb-internal-api

ceedbc3

Merge branch 'sb-internal-api' into omdb-sb

8278c09

Merge branch 'omdb-sb' into omdb-sb-polish

9070654

Shift to move faster, fix dirs, wrapping

820556f

Enable inspection of local files

0a457dd

Fmt

7fe5002

Merge branch 'main' into omicron-dev-disk-test

7fb1168

Make datasets private, add helpers to access them

36d2d04

Merge branch 'omicron-dev-disk-test' into sb-internal-api

bcfbb51

smklein added 5 commits April 30, 2025 12:45

Less unwrapping, more cleanup

865467a

Merge branch 'main' into omdb-sb-polish

cc94a11

[omdb] Add command to download entire support bundle

f2aae61

[support bundle] Add health checks to support bundles

70a9518

[support bundle] Collect bundles concurrently

1c6386e

smklein changed the title ~~[support bundle] Collect bundles concurrently~~ [support bundle] Collect sled-specific bundle information concurrently May 6, 2025

hawkw reviewed May 6, 2025

View reviewed changes

smklein added 16 commits May 7, 2025 09:36

Merge branch 'main' into omdb-sb-polish

bf76518

feedback

12d00d8

Merge branch 'omdb-sb-polish' into omdb-sb-download-whole-thing

3ac4d4c

Merge branch 'omdb-sb-download-whole-thing' into sb-health-check

6e57da6

Merge branch 'sb-health-check' into concurrent-sb-collection

272059b

feedback

b2522d1

Merge branch 'main' into omdb-sb-polish

3946a67

Merge branch 'omdb-sb-polish' into omdb-sb-download-whole-thing

531ad3a

Merge branch 'omdb-sb-download-whole-thing' into sb-health-check

160bdb0

Merge branch 'sb-health-check' into concurrent-sb-collection

9a8747e

Parallel collection

59e8750

fix boolean logic

bbe9fcb

Merge branch 'main' into omdb-sb-download-whole-thing

4538d70

Merge branch 'omdb-sb-download-whole-thing' into sb-health-check

63ef90c

Merge branch 'sb-health-check' into concurrent-sb-collection

d84fb65

avoid linebreaks

5e3cf04

Base automatically changed from sb-health-check to main May 14, 2025 21:26

smklein added 2 commits May 14, 2025 14:41

Merge branch 'main' into sb-health-check

6b29d11

Merge branch 'sb-health-check' into concurrent-sb-collection

cfd375d

papertigers reviewed May 14, 2025

View reviewed changes

papertigers approved these changes May 14, 2025

View reviewed changes

smklein merged commit 5c37744 into main May 15, 2025
16 checks passed

smklein deleted the concurrent-sb-collection branch May 15, 2025 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[support bundle] Collect sled-specific bundle information concurrently #8106

[support bundle] Collect sled-specific bundle information concurrently #8106

smklein commented May 6, 2025 •

edited

Loading

hawkw left a comment

hawkw May 6, 2025

hawkw May 6, 2025

papertigers May 7, 2025

smklein May 7, 2025 •

edited

Loading

smklein May 7, 2025

hawkw May 6, 2025

papertigers May 7, 2025

papertigers May 14, 2025

smklein May 15, 2025

[support bundle] Collect sled-specific bundle information concurrently #8106

[support bundle] Collect sled-specific bundle information concurrently #8106

Conversation

smklein commented May 6, 2025 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein May 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented May 6, 2025 •

edited

Loading

smklein May 7, 2025 •

edited

Loading