-
Notifications
You must be signed in to change notification settings - Fork 157
vmm_test: test keepalive servicing with a namespace change after save #2184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmm_test: test keepalive servicing with a namespace change after save #2184
Conversation
…er restore with keepaliveg
5058934
to
08e73ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still grokking this. Will take another look tomorrow morning. Thanks for this work.
… be cloneable anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a test that verifies NVMe keepalive functionality during OpenHCL servicing when a namespace change occurs. The test ensures that the controller properly handles namespace changes during servicing by verifying that appropriate asynchronous event requests (AER) and GET_LOG_PAGE commands are observed after restoration.
Key changes:
- Adds a new test
servicing_keepalive_with_namespace_update
that triggers a namespace change during servicing and verifies the controller's response - Introduces a
Verify
variant toQueueFaultBehavior
that signals when a matching command is observed - Removes
Clone
trait fromQueueFaultBehavior
andAdminQueueFaultConfig
to support the new verification mechanism
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs | Adds test for namespace change during servicing with keepalive enabled, extracts vtl2_nsid constant |
vm/devices/storage/nvme_test/src/workers/admin.rs | Updates fault handling to use mutable references and adds Verify behavior support |
vm/devices/storage/nvme_resources/src/fault.rs | Adds Verify variant to QueueFaultBehavior and removes Clone trait |
Co-authored-by: Copilot <[email protected]>
vm.restore_openhcl().await?; | ||
|
||
let _ = CancelContext::new() | ||
.with_timeout(Duration::from_secs(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is reasonable, but you may find that this is flakey if machines are under load. Let's see ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep an eye out for flakiness from this! New test viewer should help :)
This PR adds a new queue fault for the admin queue called verify which is modeled as a Oneshot sender. When the fault controller observes a matching command, it completes the oneshot channel. This indicates to the test that a matching command was observed. The test controller at this point disposes off the sender. Signal is only sent for the first matching command.
This also adds a new vmm test that leverages the
Verify
functionality to check for AER and GET_LOG_PAGE admin commands after restore is complete. The test waits 10s (max) for each command, failing the test if no command is seen in that time frame.