Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Oct 27, 2025

STS_SUCCESS is represented by 0u16, but other command-specific error statuses may also be 0u16. SYS_CREATE_IO_Q_INVAL_CQ is also 0u16, returned when creating I/O submission queues. In NVMe 1.1 with multi-path support, path-specific error 0u16 indicates a nondescript "internal path error".

i'd set out to the circumstances i described here by writing a very rudimentary fuzzer driving admin commands while another thread repeatedly resets the controller. this didn't work! my reasoning was pretty wrong: NvmeCtrl is already under a mutex in PciNvme, and the functions i'd conjectured were running concurrently both take &mut self so they definitely are not. i did find this though.

STS_SUCCESS is represented by 0u16, but other command-specific error
statuses may also be 0u16. `SYS_CREATE_IO_Q_INVAL_CQ` is also 0u16,
returned when creating I/O submission queues. In NVMe 1.1 with
multi-path support, path-specific error 0u16 indicates a nondescript
"internal path error".
@iximeow iximeow requested a review from pfmooney October 27, 2025 17:03
@iximeow iximeow merged commit ddbbde4 into oxidecomputer:master Oct 27, 2025
11 checks passed
iximeow added a commit that referenced this pull request Oct 28, 2025
This tries doing a bunch of random operations against an NVMe device and
checks the operations against a limited model of what the results of
those operations should be.

The initial stab at this is what caught
#965, and it caught a bug
in an intermediate state of #953 (which other phd tests did notice
anyway). This fuzzing would probably be best with actual I/O operations
mixed in, and I think that *should* be relatively straightforward to add
from here., but as-is it's useful!

This would probably be best phrased as a `cargo-fuzz` test to at least
get coverage-guided fuzzing. Because of the statefulness of NVMe I
think either way we'd want the model of expected device state and a
pick-actions-then-run execution to further guide `cargo-fuzz` into
useful parts of the device state.

The initial approach at this allowed for device reset and migration at
arbitrary times via a separate thread. When that required synchronizing
the model of device state it was effectively interleaved with "guest"
operations on the device, and in practice admin commands are serialized
by the `NvmeCtrl` state lock anyway. It may be more interesting to
revisit with concurrent I/O operations on submission/completion queues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants