[sled-diagnostics] use JoinSet for multiple commands #8151

papertigers · 2025-05-13T19:56:43Z

As a part of the #8166 investigation we decided that we should move from FuturesUnordered to a JoinSet to gain parallelism.

Created using spr 1.3.6-beta.1

hawkw

one thought, I'm not sure whether it makes a meaningful difference given that it seems much of the issue is just "fork sucks when you have a big resident set"...

hawkw · 2025-05-13T21:18:50Z

sled-diagnostics/src/lib.rs

+        let permit = Arc::clone(&self.semaphore)
+            .acquire_owned()
+            .await
+            .expect("semaphore acquire");
+        let _abort_handle = self.set.spawn(async move {


So, this will not even spawn the tasks until there's capacity in the concurrency limit. What happens if you change it to:

Suggested change

let permit = Arc::clone(&self.semaphore)

.acquire_owned()

.await

.expect("semaphore acquire");

let _abort_handle = self.set.spawn(async move {

let semaphore = self.semaphore.clone();

let _abort_handle = self.set.spawn(async move {

let permit = semaphore.acquire().await.expect("semaphore acquire");

This way, all the tasks are spawned immediately, and the task adding commands to the set can do so synchronously (changing add_command to a normal fn) and then just wait for them to all come back. Right now, the task that adds commands has to get woken up a bunch of times to add the next one to the set; I wonder whether changing this to spawn everything synchronously would make a meaningful difference in performance...

With this change I see the following...

stable rust + 1gb RSS:

took 24.416534527s

beta rust + 1gb RSS:

took 3.829577938s

Over chat I told you that with beta rust and the joinset we saw took 3.674288649s so it's negligible but overall I think a better design to go with your suggestion. I am going to push a commit to this branch.

ah well, it was worth a shot, i guess! thanks for giving it a spin!

Created using spr 1.3.6-beta.1

papertigers · 2025-05-15T19:19:38Z

Test failure filed as #8170
Rerunning the failed test....

Created using spr 1.3.6-beta.1

hawkw

Modulo fixing CI, this looks good to me. I left some non-blocking thoughts.

hawkw · 2025-05-15T19:26:29Z

sled-diagnostics/src/lib.rs

@@ -29,6 +31,41 @@ pub use crate::queries::{
 };
 use queries::*;

+/// Max number of commands to run in parallel
+const MAX_PARALLELISM: usize = 50;


How did we arrive at this number?

hawkw · 2025-05-15T19:30:52Z

sled-diagnostics/src/lib.rs

+
+impl<T: 'static + Send> MultipleCommands<T> {
+    fn new() -> MultipleCommands<T> {
+        let semaphore = Arc::new(Semaphore::new(MAX_PARALLELISM));


On the subject of the concurrency limit, it seems to me that the functions in this crate that run multiple commands fall into two categories: those that run a fairly small, fixed number of commands, like ipadm_info and dladm_info, and those which run a command or set of commands against every Oxide process PID (pargs_oxide_processes etc).

For the functions that run commands against every Oxide process pid, the concurrency limit is certainly useful, as there may be basically any number of pids. But something like ipadm_info will always spawn exactly 3 processes, which is below the concurrency limit, and all this faffing around with a Semaphore is unnecessary.

I kind of wonder if the class of functions that spawn a fixed set of commands should eschew the use of MultipleCommands and just construct a JoinSet and spawn their 3 or 5 tasks or whatever. In practice, any overhead from the semaphore acquire/release/drop and stuff is probably insignificant compared to "actually spawning a child process" so this probably doesn't actually matter, but we could avoid doing it...up to you.

hawkw · 2025-05-15T22:25:36Z

sled-diagnostics/src/lib.rs

@@ -4,7 +4,8 @@

 //! Diagnostics for an Oxide sled that exposes common support commands.

-use futures::{StreamExt, stream::FuturesUnordered};


Looks like the rebase picked up new code that uses FuturesUnordered: https://github.com/oxidecomputer/omicron/actions/runs/15056224877/job/42322633598?pr=8151#step:11:767

smklein · 2025-05-15T23:26:46Z

FWIW, I pulled this code into #8174, and added a test that we don't exceed our specified "parallelism limit" there.

papertigers · 2025-05-16T02:18:08Z

FWIW, I pulled this code into #8174, and added a test that we don't exceed our specified "parallelism limit" there.

I can wait to merge this then, thanks for pulling this into a stand alone crate. Let's just use your new crate when it lands instead of duplicating the code.

[spr] initial version

edb0440

Created using spr 1.3.6-beta.1

hawkw reviewed May 13, 2025

View reviewed changes

papertigers added 3 commits May 14, 2025 00:35

apply eliza's feedback

6a5a376

Created using spr 1.3.6-beta.1

minor cleanup

b5eb158

Created using spr 1.3.6-beta.1

drop the trait

dee2f7e

Created using spr 1.3.6-beta.1

papertigers marked this pull request as ready for review May 15, 2025 18:24

papertigers changed the title ~~WIP diagnostics use JoinSet~~ [sled-diagnostics] use JoinSet for multiple commands May 15, 2025

papertigers mentioned this pull request May 15, 2025

[sled-diagnostics] chunk the number of pids for ptool commands #8167

Open

papertigers requested review from hawkw and smklein May 15, 2025 19:19

rebase

ef87515

Created using spr 1.3.6-beta.1

hawkw approved these changes May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sled-diagnostics] use JoinSet for multiple commands #8151

[sled-diagnostics] use JoinSet for multiple commands #8151

papertigers commented May 13, 2025 •

edited

Loading

hawkw left a comment

hawkw May 13, 2025

papertigers May 14, 2025

hawkw May 14, 2025

papertigers commented May 15, 2025

hawkw left a comment

hawkw May 15, 2025

hawkw May 15, 2025

hawkw May 15, 2025

smklein commented May 15, 2025

papertigers commented May 16, 2025

		@@ -4,7 +4,8 @@

		//! Diagnostics for an Oxide sled that exposes common support commands.

		use futures::{StreamExt, stream::FuturesUnordered};

[sled-diagnostics] use JoinSet for multiple commands #8151

Are you sure you want to change the base?

[sled-diagnostics] use JoinSet for multiple commands #8151

Conversation

papertigers commented May 13, 2025 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

hawkw May 13, 2025

Choose a reason for hiding this comment

papertigers May 14, 2025

Choose a reason for hiding this comment

hawkw May 14, 2025

Choose a reason for hiding this comment

papertigers commented May 15, 2025

hawkw left a comment

Choose a reason for hiding this comment

hawkw May 15, 2025

Choose a reason for hiding this comment

hawkw May 15, 2025

Choose a reason for hiding this comment

hawkw May 15, 2025

Choose a reason for hiding this comment

smklein commented May 15, 2025

papertigers commented May 16, 2025

papertigers commented May 13, 2025 •

edited

Loading