Skip to content

Linux guest marked storage device read-only after some Crucible failures #9189

@davepacheco

Description

@davepacheco

I am not clear on almost any of the details here, but I gather from today's dogfood update that under some conditions, Linux guests on the Oxide system can mark the Crucible-backed storage device read-only, and this is a permanent condition. This can be triggered by self-service update, but I'd be surprised if it were specific to self-service update. The update process can cause multiple Crucible downstairs instances for the same disk to fail transiently for some time. We assumed (apparently incorrectly) that upstack software would always recover from transient failures, but it appears that this read-only behavior is permanent. But in that case, this problem can probably also occur without self-service update on the scene if we just have a set of sled failures.

@askfongjojo has more details about a specific instance affected by this.
@iliana mentioned in chat that "this might be why AWS tells customers to set the nvme timeout to INT_MAX".

See also oxidecomputer/crucible#1555.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions