Skip to content

Adjust instance RAM maximum value for 2 TiB gimlets #7918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jmpesp opened this issue Apr 4, 2025 · 13 comments · Fixed by #8160
Open

Adjust instance RAM maximum value for 2 TiB gimlets #7918

jmpesp opened this issue Apr 4, 2025 · 13 comments · Fixed by #8160
Labels
customer For any bug reports or feature requests tied to customer requests
Milestone

Comments

@jmpesp
Copy link
Contributor

jmpesp commented Apr 4, 2025

After #7837, 2TiB gimlets will have roughly 1667 allocated as VMM reservoir, but the maximum instance RAM is hard coded as 256GiB:

pub const MAX_MEMORY_BYTES_PER_INSTANCE: u64 = 256 * (1 << 30); // 256 GiB

Customers deploying these gimlets are requesting larger instances than this. In order to support this, Nexus has to be aware that these gimlets have more of a reservoir to allocate from, and offer larger instance sizes accordingly.

@morlandi7 morlandi7 added the customer For any bug reports or feature requests tied to customer requests label Apr 10, 2025
@askfongjojo askfongjojo added this to the 15 milestone Apr 25, 2025
@askfongjojo
Copy link

askfongjojo commented Apr 27, 2025

Turned out we can't just bump up the memory to something much higher than 256 GiB.

VMs with 512 GiB memory or more cause the sled to panic:

fffff08fad9c6680 vpanic()
fffff08fad9c6710 ~assfail3+0xd0()
fffff08fad9c6770 vmm_gpt_populate_region+0xd3(fffffb84773ba648, 100000000, 7f40000000)
fffff08fad9c67f0 vmspace_map+0xee(fffffb83a7697280, fffffb8455f7f4e8, 0, 100000000, 7f40000000, 7)
fffff08fad9c6880 vm_mmap_memseg+0xfe(fffffb840b65b000, 100000000, 2, 0, 7f40000000, 7)
fffff08fad9c6c40 vmmdev_do_ioctl+0x1c89(fffffb8477f2b3c0, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6cc0 vmm_ioctl+0x12f(9900000004, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6d00 cdev_ioctl+0x3f(9900000004, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6d50 spec_ioctl+0x55(fffffb843d197a40, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6de0 fop_ioctl+0x40(fffffb843d197a40, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6f00 ioctl+0x144(10, 766c06, ffffedffec5fb450)
fffff08fad9c6f10 sys_syscall+0x17d()

I managed to get some 384 GiB VMs to run but they aren't all stable, e.g., this one went to the GRUB menu after I ran sysbench for a while. This is what I see in the serial console at the moment:

                             GNU GRUB  version 2.12

 Ŀ
  *Ubuntu                                                                       
   Advanced options for Ubuntu                                                
   UEFI Firmware Settings                                                     
                                                                              
                                                                              
                                                                              
                                                                              
                                                                              
                                                                              
                                                                              
                                                                              
                                                                                


      Use the ^ and v keys to select which entry is highlighted.          
      Press enter to boot the selected OS, `e' to edit the commands       
      before booting or `c' for a command-line. ESC to return previous    
      menu.                                                               
                                                                               

The complete dump files for some of the panics can be found at /staff/core/omicron-7918.
cc @iximeow @pfmooney

@pfmooney
Copy link
Contributor

Turned out we can't just bump up the memory to something much higher than 256 GiB.

VMs with 512 GiB memory or more cause the sled to panic:

fffff08fad9c6680 vpanic()
fffff08fad9c6710 ~assfail3+0xd0()
fffff08fad9c6770 vmm_gpt_populate_region+0xd3(fffffb84773ba648, 100000000, 7f40000000)
fffff08fad9c67f0 vmspace_map+0xee(fffffb83a7697280, fffffb8455f7f4e8, 0, 100000000, 7f40000000, 7)
fffff08fad9c6880 vm_mmap_memseg+0xfe(fffffb840b65b000, 100000000, 2, 0, 7f40000000, 7)
fffff08fad9c6c40 vmmdev_do_ioctl+0x1c89(fffffb8477f2b3c0, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6cc0 vmm_ioctl+0x12f(9900000004, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6d00 cdev_ioctl+0x3f(9900000004, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6d50 spec_ioctl+0x55(fffffb843d197a40, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6de0 fop_ioctl+0x40(fffffb843d197a40, 766c06, ffffedffec5fb450, 202003, fffffb83f7ec1e38, fffff08fad9c6e18)
fffff08fad9c6f00 ioctl+0x144(10, 766c06, ffffedffec5fb450)
fffff08fad9c6f10 sys_syscall+0x17d()

This has been filed as illumos#17403, with a pending CR to fix it here.

@pfmooney
Copy link
Contributor

The illumos#17403 fix has been merged upstream. Once it's pulled into helios, we should be in a position to retry running large guests.

@pfmooney
Copy link
Contributor

pfmooney commented May 2, 2025

As of oxidecomputer/illumos-gate@47b4405, that fix has been merged into stlouis.

@askfongjojo
Copy link

askfongjojo commented May 3, 2025

The illumos fix allowed instances with >512 GiB memory to come up. I was able to start 756/896/960 GiB instances consistently. But with 1 TiB instances, there appear to be some sort of timeout or slowness in the control plane workflow. The log below shows such an occurrence after propolis HTTP came up (notably, at 06:34:10.494Z, there was a request handling cancelled (client disconnected) warning with request latency_us = 60000433)

06:32:57.742Z INFO SledAgent (InstanceManager): ensuring instance is registered
    file = sled-agent/src/instance_manager.rs:558
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    local_config = InstanceSledLocalConfig { hostname: Hostname("noble-1tb-regular-disks"), nics: [NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], source_nat: SourceNatConfig { ip: 172.20.29.11, first_port: 0, last_port: 16383 }, ephemeral_ip: Some(172.20.29.12), floating_ips: [], firewall_rules: [ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: None, filter_ports: None, filter_protocols: Some([Icmp]), action: Allow, priority: VpcFirewallRulePriority(65534) }, ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: Some([Vpc(Vni(12378325)), Vpc(Vni(12378325)), Vpc(Vni(12378325)), Vpc(Vni(12378325))]), filter_ports: None, filter_protocols: None, action: Allow, priority: VpcFirewallRulePriority(65534) }, ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: None, filter_ports: Some([L4PortRange { first: L4Port(22), last: L4Port(22) }]), filter_protocols: Some([Tcp]), action: Allow, priority: VpcFirewallRulePriority(65534) }], dhcp_config: DhcpConfig { dns_servers: [1.1.1.1, 9.9.9.9], host_domain: None, search_domains: [] } }
    metadata = InstanceMetadata { silo_id: 7a7f0296-d9e5-443c-84b3-1bb92550a6c5, project_id: 092e24f8-9aff-4ad5-930e-bbf30e6462b2 }
    migration_id = None
    propolis_addr = [fd00:1122:3344:102::1:9]:12400
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    propolis_spec = VmmSpec(InstanceSpecV0 { board: Board { cpus: 64, memory_mb: 1048576, chipset: I440Fx(I440Fx { enable_pcie: false }), guest_hv_interface: Bhyve, cpuid: None }, components: {Uuid(6c6289d9-de36-4adf-bf73-cb5ac85ab38a): CrucibleStorageBackend(CrucibleStorageBackend { request_json: "<redacted>", readonly: false }), Uuid(be619353-c3a6-4b8b-b6d8-a635a40cbb56): VirtioNetworkBackend(VirtioNetworkBackend { vnic_name: "" }), Name("6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device"): NvmeDisk(NvmeDisk { backend_id: Uuid(6c6289d9-de36-4adf-bf73-cb5ac85ab38a), pci_path: PciPath { bus: 0, device: 16, function: 0 }, serial_number: [110, 111, 98, 108, 101, 45, 49, 116, 98, 45, 114, 101, 103, 117, 108, 97, 114, 45, 100, 105] }), Name("be619353-c3a6-4b8b-b6d8-a635a40cbb56:device"): VirtioNic(VirtioNic { backend_id: Uuid(be619353-c3a6-4b8b-b6d8-a635a40cbb56), interface_id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, pci_path: PciPath { bus: 0, device: 8, function: 0 } }), Name("boot-settings"): BootSettings(BootSettings { order: [BootOrderEntry { id: Name("6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device") }] }), Name("cloud-init-backend"): BlobStorageBackend(BlobStorageBackend { base64: "<redacted>", readonly: true }), Name("cloud-init-dev"): VirtioDisk(VirtioDisk { backend_id: Name("cloud-init-backend"), pci_path: PciPath { bus: 0, device: 24, function: 0 } }), Name("com1"): SerialPort(SerialPort { num: Com1 }), Name("com2"): SerialPort(SerialPort { num: Com2 }), Name("com3"): SerialPort(SerialPort { num: Com3 }), Name("com4"): SerialPort(SerialPort { num: Com4 }), Name("pvpanic"): QemuPvpanic(QemuPvpanic { enable_isa: true })} })
    vmm_runtime = VmmRuntimeState { state: Starting, gen: Generation(2), time_updated: 2025-05-03T06:32:57.723252167Z }
06:32:57.742Z INFO SledAgent (InstanceManager): registering new instance
    file = sled-agent/src/instance_manager.rs:592
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    migration_id = None
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:57.742Z INFO SledAgent (InstanceManager): initializing new Instance
    file = sled-agent/src/instance.rs:1541
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    migration_id = None
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    state = InstanceInitialState { vmm_spec: VmmSpec(InstanceSpecV0 { board: Board { cpus: 64, memory_mb: 1048576, chipset: I440Fx(I440Fx { enable_pcie: false }), guest_hv_interface: Bhyve, cpuid: None }, components: {Uuid(6c6289d9-de36-4adf-bf73-cb5ac85ab38a): CrucibleStorageBackend(CrucibleStorageBackend { request_json: "<redacted>", readonly: false }), Uuid(be619353-c3a6-4b8b-b6d8-a635a40cbb56): VirtioNetworkBackend(VirtioNetworkBackend { vnic_name: "" }), Name("6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device"): NvmeDisk(NvmeDisk { backend_id: Uuid(6c6289d9-de36-4adf-bf73-cb5ac85ab38a), pci_path: PciPath { bus: 0, device: 16, function: 0 }, serial_number: [110, 111, 98, 108, 101, 45, 49, 116, 98, 45, 114, 101, 103, 117, 108, 97, 114, 45, 100, 105] }), Name("be619353-c3a6-4b8b-b6d8-a635a40cbb56:device"): VirtioNic(VirtioNic { backend_id: Uuid(be619353-c3a6-4b8b-b6d8-a635a40cbb56), interface_id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, pci_path: PciPath { bus: 0, device: 8, function: 0 } }), Name("boot-settings"): BootSettings(BootSettings { order: [BootOrderEntry { id: Name("6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device") }] }), Name("cloud-init-backend"): BlobStorageBackend(BlobStorageBackend { base64: "<redacted>", readonly: true }), Name("cloud-init-dev"): VirtioDisk(VirtioDisk { backend_id: Name("cloud-init-backend"), pci_path: PciPath { bus: 0, device: 24, function: 0 } }), Name("com1"): SerialPort(SerialPort { num: Com1 }), Name("com2"): SerialPort(SerialPort { num: Com2 }), Name("com3"): SerialPort(SerialPort { num: Com3 }), Name("com4"): SerialPort(SerialPort { num: Com4 }), Name("pvpanic"): QemuPvpanic(QemuPvpanic { enable_isa: true })} }), local_config: InstanceSledLocalConfig { hostname: Hostname("noble-1tb-regular-disks"), nics: [NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], source_nat: SourceNatConfig { ip: 172.20.29.11, first_port: 0, last_port: 16383 }, ephemeral_ip: Some(172.20.29.12), floating_ips: [], firewall_rules: [ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: None, filter_ports: None, filter_protocols: Some([Icmp]), action: Allow, priority: VpcFirewallRulePriority(65534) }, ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: Some([Vpc(Vni(12378325)), Vpc(Vni(12378325)), Vpc(Vni(12378325)), Vpc(Vni(12378325))]), filter_ports: None, filter_protocols: None, action: Allow, priority: VpcFirewallRulePriority(65534) }, ResolvedVpcFirewallRule { status: Enabled, direction: Inbound, targets: [NetworkInterface { id: 25d2eff8-73a4-4e35-86d0-891dd8e807eb, kind: Instance { id: 837708db-f8ec-4331-8588-1f179bdf78c5 }, name: Name("net0"), ip: 172.30.0.7, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 2])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: 5775a340-be7d-4815-8aba-bec567a3bd25, kind: Instance { id: b4eea5c1-3c58-41f6-8ea5-d986ccec3bb4 }, name: Name("net0"), ip: 172.30.0.9, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 4])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: a958b841-5ace-4583-b511-cf0f47e7dde4, kind: Instance { id: 19a6dcbd-851f-4bf9-8ba7-f8e473b8393f }, name: Name("net0"), ip: 172.30.0.6, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 1])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }, NetworkInterface { id: be619353-c3a6-4b8b-b6d8-a635a40cbb56, kind: Instance { id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4 }, name: Name("net0"), ip: 172.30.0.8, mac: MacAddr(MacAddr6([168, 64, 37, 240, 0, 3])), subnet: V4(Ipv4Net { addr: 172.30.0.0, width: 22 }), vni: Vni(12378325), primary: true, slot: 0, transit_ips: [] }], filter_hosts: None, filter_ports: Some([L4PortRange { first: L4Port(22), last: L4Port(22) }]), filter_protocols: Some([Tcp]), action: Allow, priority: VpcFirewallRulePriority(65534) }], dhcp_config: DhcpConfig { dns_servers: [1.1.1.1, 9.9.9.9], host_domain: None, search_domains: [] } }, vmm_runtime: VmmRuntimeState { state: Starting, gen: Generation(2), time_updated: 2025-05-03T06:32:57.723252167Z }, propolis_addr: [fd00:1122:3344:102::1:9]:12400, migration_id: None }
06:32:57.742Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 18588
    local_addr = [fd00:1122:3344:102::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:39442
    req_id = 1029f7aa-2720-4f67-ac2e-9b337764886d
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:57.818Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 189
    local_addr = [fd00:1122:3344:102::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:45489
    req_id = e25c7cc4-7e4d-4aeb-a40d-1ce88f322cb6
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:32:57.876Z INFO SledAgent (InstanceManager): Configuring new Omicron zone: oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
    file = illumos-utils/src/zone.rs:302
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:57.909Z INFO SledAgent (InstanceManager): Installing Omicron zone: oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
    file = illumos-utils/src/zone.rs:340
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:59.706Z INFO SledAgent (InstanceManager): Profile for oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f:
    <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
    <service_bundle type="profile" name="omicron">
      <service version="1" type="service" name="system/illumos/propolis-server">
        <instance enabled="true" name="default">
          <property_group type="application" name="config">
            <propval type="astring" name="datalink" value='oxControlInstance0'/>
            <propval type="astring" name="gateway" value='fd00:1122:3344:102::1'/>
            <propval type="astring" name="listen_addr" value='fd00:1122:3344:102::1:9'/>
            <propval type="astring" name="listen_port" value='12400'/>
            <propval type="astring" name="metric_addr" value='dns'/>
          </property_group>
        </instance>
      </service>
    </service_bundle>
    file = sled-agent/src/profile.rs:34
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:59.706Z INFO SledAgent (InstanceManager): Booting oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f zone
    file = illumos-utils/src/running_zone.rs:482
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:32:59.992Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 58.462762ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:00.068Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 127.629484ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:00.213Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 287.079267ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:00.518Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 462.268116ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:01.000Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 769.120227ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:01.788Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 611.978229ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:02.418Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 995.226399ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:03.432Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 645.264777ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:04.095Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 575.926817ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:04.692Z WARN SledAgent (InstanceManager): wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 755.405034ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:05.963Z INFO SledAgent (InstanceManager): Started propolis in zone: oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
    file = sled-agent/src/instance.rs:2060
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:05.982Z WARN SledAgent (InstanceManager): wait for service svc:/system/illumos/propolis-server:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 54.119825ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:06.053Z WARN SledAgent (InstanceManager): wait for service svc:/system/illumos/propolis-server:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 62.035098ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:06.134Z WARN SledAgent (InstanceManager): wait for service svc:/system/illumos/propolis-server:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 231.113306ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:06.384Z WARN SledAgent (InstanceManager): wait for service svc:/system/illumos/propolis-server:default in zone Some("oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f") failed: Property not found. retry in 289.777749ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:06.693Z INFO SledAgent (InstanceManager): Propolis SMF service is online
    file = sled-agent/src/instance.rs:2071
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:33:06.737Z INFO SledAgent (InstanceManager): Propolis HTTP server online
    file = sled-agent/src/instance.rs:2101
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:34:10.436Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60001273
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:40280
    req_id = 383a9076-f54b-49c7-8373-8b569b05b036
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:34:10.494Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60000433
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:104::5]:34422
    req_id = e677dcee-e6b4-4eac-a628-cbfea3a67c1d
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:34:17.413Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 59088834
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:102::4]:58896
    req_id = 20f9d811-ce36-4490-8bcf-7f953f40d8fe
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.137Z INFO SledAgent (InstanceManager): result of instance_ensure call is Err(Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "b97e48a0-3392-4750-bb98-d021661da848", "content-length": "124", "date": "Sat, 03 May 2025 06:35:03 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "b97e48a0-3392-4750-bb98-d021661da848" })
    file = sled-agent/src/instance.rs:1128
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:35:03.138Z ERRO SledAgent (InstanceManager): failed to create Propolis VM
    error = Propolis(Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "b97e48a0-3392-4750-bb98-d021661da848", "content-length": "124", "date": "Sat, 03 May 2025 06:35:03 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "b97e48a0-3392-4750-bb98-d021661da848" })
    file = sled-agent/src/instance.rs:1837
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:35:03.152Z INFO SledAgent (InstanceManager): instance runner exited main loop
    file = sled-agent/src/instance.rs:763
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:35:03.153Z INFO SledAgent (InstanceManager): Publishing instance state update to Nexus
    file = sled-agent/src/instance.rs:836
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    state = SledVmmState { vmm_state: VmmRuntimeState { state: Failed, gen: Generation(3), time_updated: 2025-05-03T06:35:03.152705342Z }, migration_in: None, migration_out: None }
06:35:03.180Z WARN SledAgent (dropshot (SledAgent)): request completed after handler was already cancelled
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:943
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:40280
    req_id = 383a9076-f54b-49c7-8373-8b569b05b036
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.181Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 45263165
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:102::4]:48604
    req_id = 44b3b266-5c8f-451f-9d2d-7ab9bbb573f8
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.181Z WARN SledAgent (dropshot (SledAgent)): request completed after handler was already cancelled
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:943
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:104::5]:34422
    req_id = e677dcee-e6b4-4eac-a628-cbfea3a67c1d
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.181Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 52656964
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:44510
    req_id = 08410143-fef4-4a77-8c63-27acb26d1558
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.181Z INFO SledAgent (dropshot (SledAgent)): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 52568093
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:104::5]:45221
    req_id = 6293960d-ab23-44f0-bc05-ad99daea4fbc
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.181Z WARN SledAgent (dropshot (SledAgent)): request completed after handler was already cancelled
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:943
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:102::4]:58896
    req_id = 20f9d811-ce36-4490-8bcf-7f953f40d8fe
    response_code = 200
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.273Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = Not Found
    error_message_internal = VMM with ID a87fe17a-2e41-4d44-9a77-73f84ce6369f not found
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 1070
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:54440
    req_id = af4874bb-a36a-44c8-a1dd-abd6f3050201
    response_code = 404
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.365Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = Not Found
    error_message_internal = VMM with ID a87fe17a-2e41-4d44-9a77-73f84ce6369f not found
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 77
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:102::4]:42701
    req_id = 00eb5a0b-0731-4082-86ad-5fb2bc248a58
    response_code = 404
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.366Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = Not Found
    error_message_internal = VMM with ID a87fe17a-2e41-4d44-9a77-73f84ce6369f not found
    file = /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 78
    local_addr = [fd00:1122:3344:102::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:104::5]:41173
    req_id = 13cb20a8-ee91-48bb-99d5-20497f05aec4
    response_code = 404
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:05.087Z INFO SledAgent (InstanceManager): halt_and_remove_logged: Previous zone state: Running
    file = illumos-utils/src/zone.rs:460
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:35:05.087Z INFO SledAgent (InstanceManager): Stopped and uninstalled zone
    file = illumos-utils/src/running_zone.rs:838
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    zone = oxz_propolis-server_a87fe17a-2e41-4d44-9a77-73f84ce6369f

@askfongjojo
Copy link

Here are the corresponding log lines from nexus:

06:33:10.431Z DEBG 1d0dd301-46f0-4981-a48e-9dd53a647fe3 (ServerContext): client request
    SledAgent = b64b8ba1-15ac-4382-8518-6b6b3efeea49
    background_task = instance_watcher
    body = None
    method = GET
    uri = http://[fd00:1122:3344:102::1]:12345/vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:34:10.436Z DEBG 1d0dd301-46f0-4981-a48e-9dd53a647fe3 (ServerContext): client response
    SledAgent = b64b8ba1-15ac-4382-8518-6b6b3efeea49
    background_task = instance_watcher
    result = Err(reqwest::Error { kind: Request, url: "http://[fd00:1122:3344:102::1]:12345/vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state", source: TimedOut })
06:34:10.436Z INFO 1d0dd301-46f0-4981-a48e-9dd53a647fe3 (ServerContext): sled agent is unreachable
    background_task = instance_watcher
    error = reqwest::Error { kind: Request, url: "http://[fd00:1122:3344:102::1]:12345/vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state", source: TimedOut }
    file = nexus/src/app/background/tasks/instance_watcher.rs:211
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    sled_id = b64b8ba1-15ac-4382-8518-6b6b3efeea49
    vmm_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
06:34:10.521Z DEBG 1d0dd301-46f0-4981-a48e-9dd53a647fe3 (ServerContext): client request
    SledAgent = b64b8ba1-15ac-4382-8518-6b6b3efeea49
    background_task = instance_watcher
    body = None
    method = GET
    uri = http://[fd00:1122:3344:102::1]:12345/vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f/state
06:35:03.153Z INFO 1d0dd301-46f0-4981-a48e-9dd53a647fe3 (dropshot_internal): received new VMM runtime state from sled agent
    actor_id = 001de000-05e4-4000-8000-000000000002
    authenticated = true
    file = nexus/src/app/instance.rs:2107
    local_addr = [fd00:1122:3344:103::4]:12221
    method = PUT
    migration_state = Migrations { migration_in: None, migration_out: None }
    propolis_id = a87fe17a-2e41-4d44-9a77-73f84ce6369f
    remote_addr = [fd00:1122:3344:102::1]:61090
    req_id = 6d9bdaaf-8066-437b-97f5-4887d5dd5cdd
    uri = /vmms/a87fe17a-2e41-4d44-9a77-73f84ce6369f
    vmm_state = VmmRuntimeState { state: Failed, gen: Generation(3), time_updated: 2025-05-03T06:35:03.152705342Z }

@askfongjojo
Copy link

askfongjojo commented May 3, 2025

Looking at the 960 GiB sled-agent log lines,

23:03:22.237Z INFO SledAgent (InstanceManager): Propolis HTTP server online
    file = sled-agent/src/instance.rs:2101
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = 2e841ee3-736b-4c36-88c8-cc9d6c752e79
23:03:22.558Z INFO SledAgent (InstanceManager): result of instance_ensure call is Ok(InstanceEnsureResponse { migrate: None })
    file = sled-agent/src/instance.rs:1128
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = 2e841ee3-736b-4c36-88c8-cc9d6c752e79
23:03:22.559Z INFO SledAgent (InstanceManager): observed new Propolis state
    file = sled-agent/src/instance.rs:986
    instance_id = 38e05be0-4030-4bfe-b784-e2c67ab3dba4
    propolis_id = 2e841ee3-736b-4c36-88c8-cc9d6c752e79
    state = ObservedPropolisState { vmm_state: PropolisInstanceState(Starting), migration_in: None, migration_out: None, time: 2025-05-03T23:03:22.559412449Z }

the propolis state change should be rather quick (<0.5s). The 1 TiB propolis zone simply failed to come up all the way (i.e., instance_watcher wasn't giving up prematurely). Unfortunately I can't find the corresponding propolis log to see what was actually going on - the zone was uninstalled without invoking zone-bundle collection.

@askfongjojo
Copy link

Ok, I got the propolis log this time with another 1 TiB instance by tailing the propolis log directly from outside the zone:

BRM27230037 # tail -f /pool/ext/*/crypt/zone/oxz_propolis-server*/root/var/svc/log/system-illu* | looker
[ May  3 23:31:42 Enabled. ]
[ May  3 23:31:42 Rereading configuration. ]
[ May  3 23:31:43 Rereading configuration. ]
[ May  3 23:31:43 Executing start method ("/opt/oxide/lib/svc/manifest/propolis/propolis.sh"). ]
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/datalink svc:/system/illumos/propolis-server:default
+ DATALINK=oxControlInstance4
++ svcprop -c -p config/gateway svc:/system/illumos/propolis-server:default
+ GATEWAY=fd00:1122:3344:102::1
++ svcprop -c -p config/listen_addr svc:/system/illumos/propolis-server:default
+ LISTEN_ADDR=fd00:1122:3344:102::1:1f
++ svcprop -c -p config/listen_port svc:/system/illumos/propolis-server:default
+ LISTEN_PORT=12400
++ svcprop -c -p config/metric_addr svc:/system/illumos/propolis-server:default
+ METRIC_ADDR=dns
+ [[ oxControlInstance4 == unknown ]]
+ [[ fd00:1122:3344:102::1 == unknown ]]
+ ipadm delete-if oxControlInstance4
ipadm: Could not delete oxControlInstance4: Interface does not exist
+ true
+ ipadm create-if -t oxControlInstance4
+ ipadm set-ifprop -t -p mtu=9000 -m ipv4 oxControlInstance4
+ ipadm set-ifprop -t -p mtu=9000 -m ipv6 oxControlInstance4
+ ipadm show-addr oxControlInstance4/ll
ipadm: Could not get address: Object not found
+ ipadm create-addr -t -T addrconf oxControlInstance4/ll
+ ipadm show-addr oxControlInstance4/omicron6
ipadm: Could not get address: Object not found
+ ipadm create-addr -t -T static -a fd00:1122:3344:102::1:1f oxControlInstance4/omicron6
+ route get -inet6 default -inet6 fd00:1122:3344:102::1
default: not in table
+ route add -inet6 default -inet6 fd00:1122:3344:102::1
add net default: gateway fd00:1122:3344:102::1
+ args=('run' '/opt/oxide/propolis-server/blob/OVMF_CODE.fd' "[$LISTEN_ADDR]:$LISTEN_PORT" '--metric-addr' "$METRIC_ADDR")
+ ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /opt/oxide/propolis-server/blob/OVMF_CODE.fd '[fd00:1122:3344:102::1:1f]:12400' --metric-addr dns
[ May  3 23:31:44 Method "start" exited with status 0. ]
23:31:44.538Z INFO propolis-server: Starting server...
23:31:44.539Z INFO propolis-server: listening
    local_addr = [fd00:1122:3344:102::1:1f]:12400
23:31:45.198Z INFO propolis-server: accepted connection
    local_addr = [fd00:1122:3344:102::1:1f]:12400
    remote_addr = [fd00:1122:3344:102::1]:48551
23:31:45.201Z INFO propolis-server: request completed
    error_message_external = Server not initialized (no instance)
    error_message_internal = Server not initialized (no instance)
    latency_us = 380
    local_addr = [fd00:1122:3344:102::1:1f]:12400
    method = GET
    remote_addr = [fd00:1122:3344:102::1]:48551
    req_id = 2ce33c20-846e-4a9c-a001-5e5459a65665
    response_code = 424
    uri = /instance
23:31:45.202Z INFO propolis-server: new DNS resolver
    addresses = [[fd00:1122:3344:1::1]:53, [fd00:1122:3344:2::1]:53, [fd00:1122:3344:3::1]:53]
    local_addr = [fd00:1122:3344:102::1:1f]:12400
    method = PUT
    remote_addr = [fd00:1122:3344:102::1]:48551
    req_id = 64903dce-ec27-4127-8202-47253a286056
    uri = /instance
23:31:45.251Z INFO propolis-server (vm_state_driver): initializing new VM
    bootrom = /opt/oxide/propolis-server/blob/OVMF_CODE.fd
    properties = InstanceProperties {\n    id: 38e05be0-4030-4bfe-b784-e2c67ab3dba4,\n    name: "noble-1tb-regular-disks",\n    description: "Omicron-managed VM",\n    metadata: InstanceMetadata {\n        silo_id: 7a7f0296-d9e5-443c-84b3-1bb92550a6c5,\n        project_id: 092e24f8-9aff-4ad5-930e-bbf30e6462b2,\n        sled_id: b64b8ba1-15ac-4382-8518-6b6b3efeea49,\n        sled_serial: "BRM27230037",\n        sled_revision: 13,\n        sled_model: "913-0000019",\n    },\n}
    spec = Spec {\n    board: Board {\n        cpus: 64,\n        memory_mb: 1048576,\n        chipset: I440Fx(\n            I440Fx {\n                enable_pcie: false,\n            },\n        ),\n        guest_hv_interface: Bhyve,\n    },\n    cpuid: CpuidSet {\n        map: CpuidMap(\n            {\n                0: Absent(\n                    CpuidValues {\n                        eax: 16,\n                        ebx: 1752462657,\n                        ecx: 1145913699,\n                        edx: 1769238117,\n                    },\n                ),\n                1: Absent(\n                    CpuidValues {\n                        eax: 10489617,\n                        ebx: 67584,\n                        ecx: 4275712515,\n                        edx: 395049983,\n                    },\n                ),\n                2: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                3: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                4: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                5: Absent(\n                    CpuidValues {\n                        eax: 64,\n                        ebx: 64,\n                        ecx: 3,\n                        edx: 17,\n                    },\n                ),\n                6: Absent(\n                    CpuidValues {\n                        eax: 4,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                7: Present(\n                    {\n                        0: CpuidValues {\n                            eax: 0,\n                            ebx: 537920425,\n                            ecx: 1536,\n                            edx: 0,\n                        },\n                    },\n                ),\n                8: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                9: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                10: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                11: Present(\n                    {\n                        0: CpuidValues {\n                            eax: 0,\n                            ebx: 0,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                        1: CpuidValues {\n                            eax: 0,\n                            ebx: 0,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                    },\n                ),\n                12: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                13: Present(\n                    {\n                        0: CpuidValues {\n                            eax: 7,\n                            ebx: 832,\n                            ecx: 832,\n                            edx: 0,\n                        },\n                        1: CpuidValues {\n                            eax: 1,\n                            ebx: 0,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                        2: CpuidValues {\n                            eax: 256,\n                            ebx: 576,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                    },\n                ),\n                14: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                15: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                16: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483648: Absent(\n                    CpuidValues {\n                        eax: 2147483683,\n                        ebx: 1752462657,\n                        ecx: 1145913699,\n                        edx: 1769238117,\n                    },\n                ),\n                2147483649: Absent(\n                    CpuidValues {\n                        eax: 10489617,\n                        ebx: 1073741824,\n                        ecx: 1145057787,\n                        edx: 634649599,\n                    },\n                ),\n                2147483650: Absent(\n                    CpuidValues {\n                        eax: 541347137,\n                        ebx: 1129926725,\n                        ecx: 825702176,\n                        edx: 908087347,\n                    },\n                ),\n                2147483651: Absent(\n                    CpuidValues {\n                        eax: 1866673460,\n                        ebx: 1344300402,\n                        ecx: 1701015410,\n                        edx: 1919906675,\n                    },\n                ),\n                2147483652: Absent(\n                    CpuidValues {\n                        eax: 538976288,\n                        ebx: 538976288,\n                        ecx: 538976288,\n                        edx: 2105376,\n                    },\n                ),\n                2147483653: Absent(\n                    CpuidValues {\n                        eax: 4282449728,\n                        ebx: 4282449728,\n                        ecx: 537395520,\n                        edx: 537395520,\n                    },\n                ),\n                2147483654: Absent(\n                    CpuidValues {\n                        eax: 1207968256,\n                        ebx: 1744847360,\n                        ecx: 33579328,\n                        edx: 134254912,\n                    },\n                ),\n                2147483655: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 256,\n                    },\n                ),\n                2147483656: Absent(\n                    CpuidValues {\n                        eax: 12336,\n                        ebx: 7,\n                        ecx: 0,\n                        edx: 65543,\n                    },\n                ),\n                2147483657: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483658: Absent(\n                    CpuidValues {\n                        eax: 1,\n                        ebx: 32768,\n                        ecx: 0,\n                        edx: 295419135,\n                    },\n                ),\n                2147483659: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483660: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483661: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483662: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483663: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483664: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483665: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483666: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483667: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483668: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483669: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483670: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483671: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483672: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483673: Absent(\n                    CpuidValues {\n                        eax: 4030787648,\n                        ebx: 4030726144,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483674: Absent(\n                    CpuidValues {\n                        eax: 6,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483675: Absent(\n                    CpuidValues {\n                        eax: 1023,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483676: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483677: Present(\n                    {\n                        0: CpuidValues {\n                            eax: 289,\n                            ebx: 63,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                        1: CpuidValues {\n                            eax: 323,\n                            ebx: 63,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                        2: CpuidValues {\n                            eax: 355,\n                            ebx: 63,\n                            ecx: 0,\n                            edx: 0,\n                        },\n                    },\n                ),\n                2147483678: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483679: Absent(\n                    CpuidValues {\n                        eax: 16907583,\n                        ebx: 16755,\n                        ecx: 509,\n                        edx: 1,\n                    },\n                ),\n                2147483680: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 2,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483681: Absent(\n                    CpuidValues {\n                        eax: 8269,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483682: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n                2147483683: Absent(\n                    CpuidValues {\n                        eax: 0,\n                        ebx: 0,\n                        ecx: 0,\n                        edx: 0,\n                    },\n                ),\n            },\n        ),\n        vendor: Amd,\n    },\n    disks: {\n        Name(\n            "6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device",\n        ): Disk {\n            device_spec: Nvme(\n                NvmeDisk {\n                    backend_id: Uuid(\n                        6c6289d9-de36-4adf-bf73-cb5ac85ab38a,\n                    ),\n                    pci_path: PciPath {\n                        bus: 0,\n                        device: 16,\n                        function: 0,\n                    },\n                    serial_number: [\n                        110,\n                        111,\n                        98,\n                        108,\n                        101,\n                        45,\n                        49,\n                        116,\n                        98,\n                        45,\n                        114,\n                        101,\n                        103,\n                        117,\n                        108,\n                        97,\n                        114,\n                        45,\n                        100,\n                        105,\n                    ],\n                },\n            ),\n            backend_spec: Crucible(\n                CrucibleStorageBackend {\n                    request_json: "<redacted>",\n                    readonly: false,\n                },\n            ),\n        },\n        Name(\n            "cloud-init-dev",\n        ): Disk {\n            device_spec: Virtio(\n                VirtioDisk {\n                    backend_id: Name(\n                        "cloud-init-backend",\n                    ),\n                    pci_path: PciPath {\n                        bus: 0,\n                        device: 24,\n                        function: 0,\n                    },\n                },\n            ),\n            backend_spec: Blob(\n                BlobStorageBackend {\n                    base64: "<redacted>",\n                    readonly: true,\n                },\n            ),\n        },\n    },\n    nics: {\n        Name(\n            "be619353-c3a6-4b8b-b6d8-a635a40cbb56:device",\n        ): Nic {\n            device_spec: VirtioNic {\n                backend_id: Uuid(\n                    be619353-c3a6-4b8b-b6d8-a635a40cbb56,\n                ),\n                interface_id: be619353-c3a6-4b8b-b6d8-a635a40cbb56,\n                pci_path: PciPath {\n                    bus: 0,\n                    device: 8,\n                    function: 0,\n                },\n            },\n            backend_spec: VirtioNetworkBackend {\n                vnic_name: "opte24",\n            },\n        },\n    },\n    boot_settings: Some(\n        BootSettings {\n            name: Name(\n                "boot-settings",\n            ),\n            order: [\n                BootOrderEntry {\n                    device_id: Name(\n                        "6c6289d9-de36-4adf-bf73-cb5ac85ab38a:device",\n                    ),\n                },\n            ],\n        },\n    ),\n    serial: {\n        Name(\n            "com1",\n        ): SerialPort {\n            num: Com1,\n            device: Uart,\n        },\n        Name(\n            "com2",\n        ): SerialPort {\n            num: Com2,\n            device: Uart,\n        },\n        Name(\n            "com3",\n        ): SerialPort {\n            num: Com3,\n            device: Uart,\n        },\n        Name(\n            "com4",\n        ): SerialPort {\n            num: Com4,\n            device: Uart,\n        },\n    },\n    pci_pci_bridges: {},\n    pvpanic: Some(\n        QemuPvpanic {\n            id: Name(\n                "pvpanic",\n            ),\n            spec: QemuPvpanic {\n                enable_isa: true,\n            },\n        },\n    ),\n}
    use_reservoir = true
23:33:42.279Z INFO propolis-server (vm_state_driver): publishing new instance state
    gen = 2
    migration = InstanceMigrateStatusResponse { migration_in: None, migration_out: None }
    state = Failed
23:33:42.280Z ERRO propolis-server (vm_state_driver): failed to activate new VM
    error = creating VM objects for new instance: failed to join VM object creation task: failed to add high memory region
23:33:42.280Z INFO propolis-server: request completed
    error_message_external = Internal Server Error
    error_message_internal = VM initialization failed: failed to join VM object creation task: failed to add high memory region
    latency_us = 117072604
    local_addr = [fd00:1122:3344:102::1:1f]:12400
    method = PUT
    remote_addr = [fd00:1122:3344:102::1]:48551
    req_id = 64903dce-ec27-4127-8202-47253a286056
    response_code = 500
    uri = /instance

The error it hit was VM initialization failed: failed to join VM object creation task: failed to add high memory region.

@askfongjojo
Copy link

Filed oxidecomputer/propolis#903 for follow-up.

@pfmooney
Copy link
Contributor

pfmooney commented May 5, 2025

I found some constants which need to be adjusted up in order to allow this to work.

@askfongjojo
Copy link

@pfmooney / @iximeow - I am able to run a 1 TiB memory VM (with 64 vcpus) without problem on dublin using this commit. I ran some sysbench tests (cpu/memory benchmarks) as well as a MySQL benchmark for 12 hours. Things are looking stable to me so I think the omicron side of changes can be merged to main.

@askfongjojo
Copy link

On Dublin, I made a VM with 1100585369600 bytes (1025 GiB) memory and it booted up successfully.

However, resizing the VM to 1105954078720 bytes (1030 GiB) or above resulted in the following error, as seen on the serial console:

BdsDxe: failed to load Boot0001 "UEFI Non-Block Boot Device" from PciRoot(0x0)/Pci(0x18,0x0): Not Found

@iximeow @david-crespo

@iximeow
Copy link
Member

iximeow commented May 15, 2025

well. i didn't totally mean to close this since 2 TiB gimlets could/should allow VM sizes upwards of 1.5 TiB of memory. but as we discovered above and in the notes on #8160, oxidecomputer/propolis#907 needs to get resolved and that updated Propolis needs to get pulled in before we do that. otherwise OVMF won't know how to do PCI and instances won't have disks.

but, 1024 GiB is much more than 256 GiB so we're 4x better than before!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer For any bug reports or feature requests tied to customer requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants