Skip to content

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Oct 17, 2025

Re-structures support bundle collection into tasks.

This centralizes step dispatching, and makes it slightly more clear that tasks are independent (where they can be).

}

type CollectionStepFn = Box<
dyn for<'b> FnOnce(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature looks a little nasty, but it's basically just saying:

  • Every "step" acts on a BundleCollection,
  • ... has an output directory
  • ... and can emit a CollectionStepOutput object (which either updates the report, or makes more steps)

Comment on lines +717 to +718
let mut tasks =
ParallelTaskSet::new_with_parallelism(MAX_CONCURRENT_STEPS);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we had some ParallelTaskSets embedded within the collection of sub-pieces of the bundle.

This new "step-based" infrastructure shares that more broadly - everything should be using this one ParallelTaskSet (which is good? that prevents a task that spawns a bunch of other task set, that spawns more task sets - everything is just a unit of work that can get added to this one set).


let ereport_collection = if let Some(ref ereport_filters) =
self.request.ereport_query
async fn collect_host_ereports(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior mechanism of collecting ereports from {host, sp} did a bit of manual task management, and stored atomics within the bundle itself.

I've just made each part of the ereport collection a distinct step: no more atomics, no more manual tokio tasks - each one is just "one step" with output.

Comment on lines +887 to +914
async fn get_or_initialize_mgs_client<'a>(
&self,
mgs_client: &'a OnceCell<Arc<Option<MgsClient>>>,
) -> &'a Arc<Option<MgsClient>> {
mgs_client
.get_or_init(|| async {
Arc::new(self.create_mgs_client().await.ok())
})
.await
}

const MAX_CONCURRENT_SLED_REQUESTS: usize = 16;
const FAILURE_MESSAGE: &str =
"Failed to fully collect support bundle info from sled";
let mut set = ParallelTaskSet::new_with_parallelism(
MAX_CONCURRENT_SLED_REQUESTS,
async fn get_or_initialize_all_sleds<'a>(
&self,
all_sleds: &'a OnceCell<Arc<Option<Vec<Sled>>>>,
) -> &'a Arc<Option<Vec<Sled>>> {
all_sleds
.get_or_init(|| async {
Arc::new(
self.datastore
.sled_list_all_batched(
&self.opctx,
SledFilter::InService,
)
.await
.ok(),
)
})
.await
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using lazily-initialized variables for "data/clients that might get created as a part of bundle collection, but might not".

This becomes more relevant with #9254, when we may or may not need these values to get initialized at all.


let mut extra_steps: Vec<(&'static str, CollectionStepFn)> = vec![];
for sp in get_available_sps(&mgs_client).await? {
extra_steps.push((
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than collecting all SP reports via new tokio tasks, these are creating new "steps" that get kicked back to the top-level ParallelTaskSet I mentioned earlier.

@smklein smklein requested review from hawkw and wfchandler October 17, 2025 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant