Ensure NVMe restore task finishes before worker runs #1321

yupavlen-ms · 2025-05-08T18:38:59Z

Running real life scenarios shows failures which only can be explained by race conditions.
It appears that the original intent to wait for restore task to finish before worker runs, has a gap.
The fix is to add a proper wait for futures to complete.

chris-oo · 2025-05-08T21:43:11Z

openhcl/underhill_core/src/nvme_manager.rs

+                    saved_state.as_ref().unwrap(),
+                )];
+                // TODO: Move join_all into NvmeManagerWorker::restore instead?
+                join_all(state_vec)


is this what you want? who do you want to block , the caller of new, or the user of this task? or we just need to restore everything before we do the run loop below?

We need to restore everything before worker.run() is started and will begin accepting GetNamespace requests. Restore task is async, because of vfio uevent waiting, and we can get conflicting namespace init/restore.

yupavlen-ms · 2025-05-08T21:54:29Z

openhcl/underhill_core/src/nvme_manager.rs

+                    &mut worker,
+                    saved_state.as_ref().unwrap(),
+                )];
+                // TODO: Move join_all into NvmeManagerWorker::restore instead?


I think this TODO is not going to happen but I put it here anyway if someone can suggest if that is also safe as the currently proposed solution.

smalis-msft · 2025-05-13T14:37:35Z

openhcl/underhill_core/src/nvme_manager.rs

@@ -109,7 +109,12 @@ impl NvmeManager {
        let task = driver.spawn("nvme-manager", async move {
            // Restore saved data (if present) before async worker thread runs.
            if saved_state.is_some() {
-                let _ = NvmeManager::restore(&mut worker, saved_state.as_ref().unwrap())
+                let state_vec = vec![NvmeManager::restore(


Why make a vec and use join_all when we only have one function being called? Why not just join it directly?

That was my concern too. But I couldn't find a join variant which accepts single future, there should be at least two. Vectorized join just works. Or I was looking at the wrong places? Please correct me.

Wouldn't that just be .await?

That's what we were doing before actually. So what's changing here?

Will it guarantee sequential execution? If you look inside NvmeManager::restore - it has its own .await but the Kusto results suggest the parallel execution of both NvmeManager::restore and worker.run

Does bringing .await one level up solves this problem?

await does guarantee sequential execution, yes. I don't think changing the level of the awaits would matter.

However the task being created by driver.spawn can run in parallel with other code. Is it possible that that's what you're observing?

Other code is fine, we should not block it, the problem I saw is that run task can respond to Request::GetNamespace before restore task finishes. So instead of restoring driver state we can initialize it again and then try to restore it.

This change is to prevent starting run task (e.g. the spawned task should not reach line worker.run() before everything is restored).

Adding join_all made some visible changes in the logging when tested in the lab, now it does look more serialized, although we didn't get scaled test results yet due to recent circumstances.

Are you sure that that's what you're seeing? Is it possible that restore is hitting an error and returning early? Currently you're ignoring that error, should you instead be unwrapping it or something?

The reason there's no join that takes a single future is because that's what await does. If you want to run multiple futures concurrently you need something like join_all.

Agree, if there was an error in restore, that could be another explanation. Let me add better error handling then.

Ensure restore task finishes before worker runs

de21e27

yupavlen-ms requested a review from a team as a code owner May 8, 2025 18:39

Fix clippy warning

308c56d

chris-oo reviewed May 8, 2025

View reviewed changes

yupavlen-ms commented May 8, 2025

View reviewed changes

smalis-msft reviewed May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure NVMe restore task finishes before worker runs #1321

Ensure NVMe restore task finishes before worker runs #1321

yupavlen-ms commented May 8, 2025

chris-oo May 8, 2025

yupavlen-ms May 8, 2025 •

edited

Loading

yupavlen-ms May 8, 2025

smalis-msft May 13, 2025

yupavlen-ms May 13, 2025

smalis-msft May 16, 2025

smalis-msft May 16, 2025

yupavlen-ms May 16, 2025

smalis-msft May 16, 2025

yupavlen-ms May 16, 2025

smalis-msft May 16, 2025

smalis-msft May 16, 2025 •

edited

Loading

yupavlen-ms May 16, 2025

Ensure NVMe restore task finishes before worker runs #1321

Are you sure you want to change the base?

Ensure NVMe restore task finishes before worker runs #1321

Conversation

yupavlen-ms commented May 8, 2025

Choose a reason for hiding this comment

yupavlen-ms May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smalis-msft May 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yupavlen-ms May 8, 2025 •

edited

Loading

smalis-msft May 16, 2025 •

edited

Loading