Skip to content

[sled-diagnostics] use JoinSet for multiple commands #8151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

papertigers
Copy link
Contributor

@papertigers papertigers commented May 13, 2025

As a part of the #8166 investigation we decided that we should move from FuturesUnordered to a JoinSet to gain parallelism.

Created using spr 1.3.6-beta.1
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thought, I'm not sure whether it makes a meaningful difference given that it seems much of the issue is just "fork sucks when you have a big resident set"...

Comment on lines 75 to 79
let permit = Arc::clone(&self.semaphore)
.acquire_owned()
.await
.expect("semaphore acquire");
let _abort_handle = self.set.spawn(async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this will not even spawn the tasks until there's capacity in the concurrency limit. What happens if you change it to:

Suggested change
let permit = Arc::clone(&self.semaphore)
.acquire_owned()
.await
.expect("semaphore acquire");
let _abort_handle = self.set.spawn(async move {
let semaphore = self.semaphore.clone();
let _abort_handle = self.set.spawn(async move {
let permit = semaphore.acquire().await.expect("semaphore acquire");

This way, all the tasks are spawned immediately, and the task adding commands to the set can do so synchronously (changing add_command to a normal fn) and then just wait for them to all come back. Right now, the task that adds commands has to get woken up a bunch of times to add the next one to the set; I wonder whether changing this to spawn everything synchronously would make a meaningful difference in performance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change I see the following...

stable rust + 1gb RSS:

took 24.416534527s

beta rust + 1gb RSS:

took 3.829577938s

Over chat I told you that with beta rust and the joinset we saw took 3.674288649s so it's negligible but overall I think a better design to go with your suggestion. I am going to push a commit to this branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah well, it was worth a shot, i guess! thanks for giving it a spin!

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@papertigers papertigers marked this pull request as ready for review May 15, 2025 18:24
@papertigers papertigers changed the title WIP diagnostics use JoinSet [sled-diagnostics] use JoinSet for multiple commands May 15, 2025
@papertigers
Copy link
Contributor Author

Test failure filed as #8170
Rerunning the failed test....

@papertigers papertigers requested review from hawkw and smklein May 15, 2025 19:19
Created using spr 1.3.6-beta.1
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo fixing CI, this looks good to me. I left some non-blocking thoughts.

@@ -29,6 +31,41 @@ pub use crate::queries::{
};
use queries::*;

/// Max number of commands to run in parallel
const MAX_PARALLELISM: usize = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we arrive at this number?


impl<T: 'static + Send> MultipleCommands<T> {
fn new() -> MultipleCommands<T> {
let semaphore = Arc::new(Semaphore::new(MAX_PARALLELISM));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the subject of the concurrency limit, it seems to me that the functions in this crate that run multiple commands fall into two categories: those that run a fairly small, fixed number of commands, like ipadm_info and dladm_info, and those which run a command or set of commands against every Oxide process PID (pargs_oxide_processes etc).

For the functions that run commands against every Oxide process pid, the concurrency limit is certainly useful, as there may be basically any number of pids. But something like ipadm_info will always spawn exactly 3 processes, which is below the concurrency limit, and all this faffing around with a Semaphore is unnecessary.

I kind of wonder if the class of functions that spawn a fixed set of commands should eschew the use of MultipleCommands and just construct a JoinSet and spawn their 3 or 5 tasks or whatever. In practice, any overhead from the semaphore acquire/release/drop and stuff is probably insignificant compared to "actually spawning a child process" so this probably doesn't actually matter, but we could avoid doing it...up to you.

@@ -4,7 +4,8 @@

//! Diagnostics for an Oxide sled that exposes common support commands.

use futures::{StreamExt, stream::FuturesUnordered};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the rebase picked up new code that uses FuturesUnordered: https://github.com/oxidecomputer/omicron/actions/runs/15056224877/job/42322633598?pr=8151#step:11:767

@smklein
Copy link
Collaborator

smklein commented May 15, 2025

FWIW, I pulled this code into #8174, and added a test that we don't exceed our specified "parallelism limit" there.

@papertigers
Copy link
Contributor Author

FWIW, I pulled this code into #8174, and added a test that we don't exceed our specified "parallelism limit" there.

I can wait to merge this then, thanks for pulling this into a stand alone crate. Let's just use your new crate when it lands instead of duplicating the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants