-
Notifications
You must be signed in to change notification settings - Fork 123
Ensure NVMe restore task finishes before worker runs #1321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
saved_state.as_ref().unwrap(), | ||
)]; | ||
// TODO: Move join_all into NvmeManagerWorker::restore instead? | ||
join_all(state_vec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what you want? who do you want to block , the caller of new, or the user of this task? or we just need to restore everything before we do the run loop below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to restore everything before worker.run() is started and will begin accepting GetNamespace requests. Restore task is async, because of vfio uevent waiting, and we can get conflicting namespace init/restore.
&mut worker, | ||
saved_state.as_ref().unwrap(), | ||
)]; | ||
// TODO: Move join_all into NvmeManagerWorker::restore instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this TODO is not going to happen but I put it here anyway if someone can suggest if that is also safe as the currently proposed solution.
@@ -109,7 +109,12 @@ impl NvmeManager { | |||
let task = driver.spawn("nvme-manager", async move { | |||
// Restore saved data (if present) before async worker thread runs. | |||
if saved_state.is_some() { | |||
let _ = NvmeManager::restore(&mut worker, saved_state.as_ref().unwrap()) | |||
let state_vec = vec![NvmeManager::restore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make a vec and use join_all when we only have one function being called? Why not just join it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my concern too. But I couldn't find a join
variant which accepts single future, there should be at least two. Vectorized join just works. Or I was looking at the wrong places? Please correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that just be .await
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what we were doing before actually. So what's changing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it guarantee sequential execution? If you look inside NvmeManager::restore - it has its own .await
but the Kusto results suggest the parallel execution of both NvmeManager::restore
and worker.run
Does bringing .await
one level up solves this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await does guarantee sequential execution, yes. I don't think changing the level of the awaits would matter.
However the task being created by driver.spawn can run in parallel with other code. Is it possible that that's what you're observing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other code is fine, we should not block it, the problem I saw is that run
task can respond to Request::GetNamespace
before restore
task finishes. So instead of restoring driver state we can initialize it again and then try to restore it.
This change is to prevent starting run
task (e.g. the spawned task should not reach line worker.run()
before everything is restored).
Adding join_all
made some visible changes in the logging when tested in the lab, now it does look more serialized, although we didn't get scaled test results yet due to recent circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that that's what you're seeing? Is it possible that restore is hitting an error and returning early? Currently you're ignoring that error, should you instead be unwrapping it or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason there's no join
that takes a single future is because that's what await does. If you want to run multiple futures concurrently you need something like join_all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, if there was an error in restore, that could be another explanation. Let me add better error handling then.
Running real life scenarios shows failures which only can be explained by race conditions.
It appears that the original intent to wait for restore task to finish before worker runs, has a gap.
The fix is to add a proper wait for futures to complete.