Description
Repro steps:
- Start an ad hoc propolis-server instance
- Send it an ensure request
- Ask to stop the VM instance (you don't have to start it first, though you can)
Observed: The DESTROY_SELF
vmm ioctl is issued (and a probe set on vmm_destroy_locked
fires), but the kernel VMM persists until the process is killed. The stack on the resulting call to vmm_destroy_finish
shows it originated from genunix!proc_exit
. Writing a simple Drop
impl for VmmHdl
that just prints to stderr shows that this drop impl is apparently never reached.
Expected: there is at least some way to convince Propolis to fully close the kernel VMM fd on VM destruction.
We've discussed this in the past and concluded that in at least some cases it's useful for the kernel VMM to outlive the Propolis instance that owns it so that the VMM can be inspected with tools like mdb -b
. I would at least like to consider avoiding this for production builds, though, for reasons related to this Omicron issue comment: it's useful for sled-agent to be able to say "Propolis reported that it's in the Destroyed state, so all its reservoir memory is released," because this helps give it the ability to tell Nexus that a VMM is gone and then do long-running zone cleanup operations afterward.
Even absent that motivation, I'd at least like to understand exactly which paths aren't fully dropping their VmmHdl
references so that we can adjust the behavior if/when we need to.