-
Notifications
You must be signed in to change notification settings - Fork 23
Description
One thing I noticed in my latest round of debugging #336 was that I couldn't stop or reboot the guest in question. In the case of #336, the crucible upstairs and downstairs were incompatible, and the failure mode was that only a handful of I/Os were making it through, so the guest wasn't able to do much. While this obviously isn't the happy path, I think we need to be able to stop and reboot instances that get stuck in this way.
It's easy to reproduce some form of #336 by combining a downstairs at 894d44
and an upstairs at e7ce7a
. For this case, reboot
seems to work, but stop
failed, first with a 400, then a 500:
jordan@maxwell ~/propolis $ ./cli.sh state run
Apr 23 20:26:23.437 INFO PUT request to http://172.20.3.73:8000/instance/state, propolis_client address: 172.20.3.73:8000
Error: failed to set instance state
Caused by:
Bad Status: 400
jordan@maxwell ~/propolis $
jordan@maxwell ~/propolis $ ./cli.sh state run
Apr 23 20:26:57.455 INFO PUT request to http://172.20.3.73:8000/instance/state, propolis_client address: 172.20.3.73:8000
Error: failed to set instance state
Caused by:
Bad Status: 500
server logs:
Apr 23 20:26:23.438 INFO Requested state Run via API, component: vm_controller
Apr 23 20:26:23.438 INFO Queuing external request, disposition: Deny(HaltPending), request: Start, component: external_request_queue
Apr 23 20:26:23.439 INFO request completed, error_message_external: Instance operation failed: Failed to queue requested state change: Instance is preparing to stop, error_message_internal: Instance operation fa
iled: Failed to queue requested state change: Instance is preparing to stop, response_code: 400, uri: /instance/state, method: PUT, req_id: da972575-1e38-46c4-9b06-88fff4c227de, remote_addr: 172.20.3.73:54058, l
ocal_addr: 172.20.3.73:8000
Apr 23 20:26:57.455 INFO accepted connection, remote_addr: 172.20.3.73:61958, local_addr: 172.20.3.73:8000
Apr 23 20:26:57.456 INFO request completed, error_message_external: Internal Server Error, error_message_internal: Server not initialized (no instance), response_code: 500, uri: /instance/state, method: PUT, req_id: 40a97378-4272-4907-a44b-2320c0c7c1b2, remote_addr: 172.20.3.73:61958, local_addr: 172.20.3.73:8000
I can't recall if this is the exact failure mode I saw when debugging #336, but this particular instance of it feels in the same realm as #363 (maybe?).