Skip to content

Flaky test: TestAccCloudResourceImage_full #655

@lgfa29

Description

@lgfa29

The TestAccCloudResourceImage_full has been observed to fail at times with the following error:

=== FAIL: internal/provider TestAccCloudResourceImage_full (181.81s)
    resource_image_test.go:94: Step 1/2 error: Error running apply: exit status 1
        
        Error: Error creating snapshot
        
          with oxide_snapshot.acc-support-773032af-aac2-4ec4-b6ba-25c03701bf9c,
          on terraform_plugin_test.tf line 29, in resource "oxide_snapshot" "acc-support-773032af-aac2-4ec4-b6ba-25c03701bf9c":
          29:  resource "oxide_snapshot" "acc-support-773032af-aac2-4ec4-b6ba-25c03701bf9c" {
        
        API error: error sending request: Post
        "http://localhost:8080/v1/snapshots?project=1befb81b-31ba-48e6-bd64-4ffdd5812afd":
        context deadline exceeded
    panic.go:615: Error running post-test destroy, there may be dangling resources: exit status 1
        
        Error: Unable to delete disk:
        
        API error: DELETE
        http://localhost:8080/v1/disks/0e834800-ee6d-4a55-85a2-77dc20f45b65
        ----------- RESPONSE -----------
        Status: 400 InvalidRequest
        Message: disk cannot be deleted in state "maintenance"
        RequestID: 60a14a52-6b60-4461-8e9f-bd05d066ae4a
        ------- RESPONSE HEADERS -------
        X-Request-Id: [60a14a52-6b60-4461-8e9f-bd05d066ae4a]
        Content-Length: [156]
        Date: [Wed, 25 Feb 2026 08:03:28 GMT]
        Content-Type: [application/json]

Sample run: https://github.com/oxidecomputer/terraform-provider-oxide/actions/runs/22387128073/job/64801391010#step:15:247
Logs: logs-terraform-64801391010.zip

The main culprit is an an error in the simulated crucible that prevents the snapshot from completing. Nexus will then retry the request with an exponential backoff, but the request will eventually timeout (context deadline exceeded).

08:00:30.198Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): sending attach request for 0e834800-ee6d-4a55-85a2-77dc20f45b65 to [::1]:36361
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
    saga_name = snapshot-create
08:00:30.266Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 13
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
08:00:30.274Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Started
    node_id = 14
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
08:00:30.280Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): sending snapshot request with id dc83911d-d59f-418b-aa86-b96a5796d08f for disk 0e834800-ee6d-4a55-85a2-77dc20f45b65 to pantry endpoint http://[::1]:36361
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
    saga_name = snapshot-create
08:00:30.339Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): saw transient communication error error sending request for url (http://[::1]:36361/crucible/pantry/0/volume/0e834800-ee6d-4a55-85a2-77dc20f45b65/snapshot), retrying...
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
    saga_name = snapshot-create
08:00:30.340Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): failed external call (ProgenitorError(Communication Error: error sending request for url (http://[::1]:36361/crucible/pantry/0/volume/0e834800-ee6d-4a55-85a2-77dc20f45b65/snapshot))), will retry in 224.637991ms
    saga_id = 74bc4d3c-bf12-4a6f-8253-1c60e8826438
    saga_name = snapshot-create

The second error is a consequence of the fist one. The Terraform test suite will try to delete the disk, but since the saga is still active, the disk is in maintenance mode.

Disk deletion saga

08:03:28.064Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): tracking newly created saga
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.064Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): preparing saga
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
08:03:28.064Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): saga create
    dag = {"end_node":14,"graph":{"edge_property":"directed","edges":[[0,1,null],[1,2,null],[2,3,null],[4,5,null],[4,6,null],[6,7,null],[5,7,null],[7,8,null],[8,9,null],[9,10,null],[10,11,null],[3,4,null],[11,12,null],[13,0,null],[12,14,null]],"node_holes":[],"nodes":[{"Action":{"action_name":"disk_delete.delete_disk_record","label":"DeleteDiskRecord","name":"deleted_disk"}},{"Action":{"action_name":"disk_delete.space_account","label":"SpaceAccount","name":"no_result1"}},{"Constant":{"name":"params_for_volume_delete_subsaga","value":{"serialized_authn":{"kind":{"Authenticated":[{"actor":{"SiloUser":{"silo_id":"272b2939-5b35-4c46-8186-88424c3180d4","silo_user_id":"e87a45a2-d57b-40cb-8403-2e371c159e9b"}},"credential_id":"5f270637-17f0-47bf-8e27-9722c401485f","device_token_expiration":null},{"mapped_fleet_roles":{"admin":["admin"]}}]}},"volume_id":"72cdb49f-7794-451f-b0c5-655e7352ea88"}}},{"SubsagaStart":{"params_node_name":"params_for_volume_delete_subsaga","saga_name":"volume-delete"}},{"Action":{"action_name":"volume_delete.decrease_crucible_resource_count","label":"DecreaseCrucibleResourceCount","name":"crucible_resources_to_delete"}},{"Action":{"action_name":"volume_delete.delete_crucible_regions","label":"DeleteCrucibleRegions","name":"no_result_1"}},{"Action":{"action_name":"volume_delete.delete_crucible_running_snapshots","label":"DeleteCrucibleRunningSnapshots","name":"no_result_2"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshots","label":"DeleteCrucibleSnapshots","name":"no_result_3"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshot_records","label":"DeleteCrucibleSnapshotRecords","name":"no_result_4"}},{"Action":{"action_name":"volume_delete.find_freed_crucible_regions","label":"FindFreedCrucibleRegions","name":"freed_crucible_regions"}},{"Action":{"action_name":"volume_delete.delete_freed_crucible_regions","label":"DeleteFreedCrucibleRegions","name":"no_result_5"}},{"Action":{"action_name":"volume_delete.hard_delete_volume_record","label":"HardDeleteVolumeRecord","name":"volume_hard_deleted"}},{"SubsagaEnd":{"name":"volume_delete_subsaga_no_result"}},{"Start":{"params":{"disk":{"Crucible":{"disk":{"block_size":"Traditional","disk_type":"Crucible","identity":{"description":"a test disk","id":"0e834800-ee6d-4a55-85a2-77dc20f45b65","name":"acc-terraform-6415b431-8c9d-4a57-b5a0-fd0077df1f50","time_created":"2026-02-25T08:00:27.136307Z","time_deleted":null,"time_modified":"2026-02-25T08:00:27.136307Z"},"project_id":"1befb81b-31ba-48e6-bd64-4ffdd5812afd","rcgen":1,"runtime_state":{"attach_instance_id":null,"disk_state":"maintenance","gen":3,"time_updated":"2026-02-25T08:00:29.842787Z"},"size":1073741824,"slot":null},"disk_type_crucible":{"create_image_id":null,"create_snapshot_id":null,"disk_id":"0e834800-ee6d-4a55-85a2-77dc20f45b65","pantry_address":null,"read_only":false,"volume_id":"72cdb49f-7794-451f-b0c5-655e7352ea88"}}},"project_id":"1befb81b-31ba-48e6-bd64-4ffdd5812afd","serialized_authn":{"kind":{"Authenticated":[{"actor":{"SiloUser":{"silo_id":"272b2939-5b35-4c46-8186-88424c3180d4","silo_user_id":"e87a45a2-d57b-40cb-8403-2e371c159e9b"}},"credential_id":"5f270637-17f0-47bf-8e27-9722c401485f","device_token_expiration":null},{"mapped_fleet_roles":{"admin":["admin"]}}]}}}}},"End"]},"saga_name":"disk-delete","start_node":13}
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
    sec_id = e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
08:03:28.064Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): creating saga
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
08:03:28.120Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): starting saga
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
08:03:28.120Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): saga start
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
    sec_id = e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
08:03:28.120Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Started
    node_id = 13
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.122Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 13
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.125Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Started
    node_id = 0
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.130Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = Failed(ActionFailed { source_error: Object {"InvalidRequest": Object {"message": Object {"external_message": String("disk cannot be deleted in state \\"maintenance\\""), "internal_context": String("")}}} })
    node_id = 0
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.132Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): update for saga cached state
    new_state = Unwinding
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    sec_id = e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
08:03:28.132Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): updating state
    new_state = unwinding
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.135Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = UndoStarted
    node_id = 13
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.137Z DEBG e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): recording saga event
    event_type = UndoFinished
    node_id = 13
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.139Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): update for saga cached state
    new_state = Done
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    sec_id = e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
08:03:28.139Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): updating state
    new_state = done
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.142Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): saga finished
    action_error_node_name = "deleted_disk"
    action_error_source = ActionFailed { source_error: Object {"InvalidRequest": Object {"message": Object {"external_message": String("disk cannot be deleted in state \\"maintenance\\""), "internal_context": String("")}}} }
    result = failure
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
    sec_id = e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
    undo_result = success
08:03:28.142Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): tracked saga has finished
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
08:03:28.143Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): saga finished
    saga_id = 34763a14-11f3-40ea-adf1-5277faf61031
    saga_name = disk-delete
    saga_result = SagaResult { saga_id: 34763a14-11f3-40ea-adf1-5277faf61031, saga_log: SagaLog { saga_id: 34763a14-11f3-40ea-adf1-5277faf61031, unwinding: true, events: [N013 started, N013 succeeded, N000 started, N000 failed, N013 undo_started, N013 undo_finished], node_status: {0: Failed(ActionFailed { source_error: Object {"InvalidRequest": Object {"message": Object {"external_message": String("disk cannot be deleted in state \\"maintenance\\""), "internal_context": String("")}}} }), 13: UndoFinished} }, kind: Err(SagaResultErr { error_node_name: "deleted_disk", error_source: ActionFailed { source_error: Object {"InvalidRequest": Object {"message": Object {"external_message": String("disk cannot be deleted in state \\"maintenance\\""), "internal_context": String("")}}} }, undo_failure: None }) }

This is an unrecoverable error, so even re-running the test will not make it pass because the crucible pantry is not happy.

08:05:07.411Z INFO e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (dropshot_external): request completed
    error_message_external = Internal Server Error
    error_message_internal = saga ACTION error at node "pantry_address": deserialize failed: unknown variant `failed to claim pantry client from pool: Backends exist, but none are online`, expected one of `ObjectNotFound`, `ObjectAlreadyExists`, `InvalidRequest`, `Unauthenticated`, `InvalidValue`, `Forbidden`, `InternalError`, `ServiceUnavailable`, `InsufficientCapacity`, `TypeVersionMismatch`, `Conflict`, `NotFound`, `Gone`
    latency_us = 30594998
    local_addr = 0.0.0.0:12220
    method = POST
    remote_addr = 172.18.0.3:41762
    req_id = 37502dbf-9a25-46ff-adf2-1740d026b086
    response_code = 500
    uri = /v1/snapshots?project=1befb81b-31ba-48e6-bd64-4ffdd5812afd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions