-
Notifications
You must be signed in to change notification settings - Fork 43
[support bundle] Collect sled-specific bundle information concurrently #8106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the obnoxious drive-by :)
let mut sled_stream = futures::stream::iter(all_sleds) | ||
.map(|sled| async move { | ||
self.collect_data_from_sled(&log, &sled, dir).await | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I note that while the collect_data_from_sled
calls will execute concurrently, the async fns will not actually run in parallel, since they're not spawned in separate tasks. so there will only ever be one thread executing one of those futures at a time, although we will be running them concurrently and executing the next one when one yields. in this case, i think that's mostly okay, since it looks like most of the runtime of collect_data_from_sled
is probably waiting on IO (waiting for the HTTP requests to come back and the tokio::fs
operations, which run on the blocking threadpool, to complete). but, i might nonetheless consider spawning tasks for each sled, instead. up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if you decide these should run in parallel i might consider using tokio::task::JoinSet
rather than buffer_unordered
, since it uses the task allocation to store the task's membership in the joinset and therefore uses much less heap. may not actually matter though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's super interesting, thanks for sharing that bit of info on buffer_unordered
vs JoinSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this refactor -- I'm aware that it was concurrent, not parallel -- but the 'static
bound on the spawn-ed tasks means I probably can't operate on &self
, and will need to refactor this into a free function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these changes in 59e8750 - I'm using some manual counting to tasks to respect my previous "maximum bound on current activity".
// Currently we execute up to 10 commands concurrently which | ||
// might be doing their own concurrent work, for example | ||
// collectiong `pstack` output of every Oxide process that is | ||
// found on a sled. | ||
.buffer_unordered(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, concurrent but not parallel. whether that matters depends on how much CPU-bound work these functions do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you mentioned fs operations being spawned onto the blocking thread pool -- does that mean futures_unordered
can preform those tasks concurrently still since the await point is waiting for the result from the blocking thread typically? I am curious because the the log collection is a synchronous task which we use spawn_blocking
for ourselves.
continue; | ||
// While we have incoming work to send to tasks (sleds_iter) | ||
// or a task operating on that data (tasks)... | ||
while sleds_iter.peek().is_some() || !tasks.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could probably swap this to use a semaphore but I am not pressed about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into refactoring this behavior out as a follow-up
In the interest of improving support bundle collection times, avoid bottlenecking on each sled individually.
Instead, collect from up to 16 sleds at once.