Skip to content

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Oct 15, 2025

Discussing possible causes for the zeroes appearing at the beginning of an extent file (#1788), one theory that came up was that the Extent was being repaired, and those zeroes came from reading the Extent from the repair API, not from the disk itself: only one Region's Extents were bad, not all of them.

In order to test this theory, add two new tests, both of which will contact an Agent to get a Region's details, then use the repair API to read Extent data files and either look for a block of zeroes at the beginning or write that to a temporary file and use the new Extent::validate routine.

Discussing possible causes for the zeroes appearing at the beginning of
an extent file (oxidecomputer#1788), one theory that came up was that the Extent was
being repaired, and those zeroes came from reading the Extent from the
repair API, not from the disk itself: only _one_ Region's Extents were
bad, not all of them.

In order to test this theory, add two new tests, both of which will
contact an Agent to get a Region's details, then use the repair API to
read Extent data files and either look for a block of zeroes at the
beginning or write that to a temporary file and use the new
`Extent::validate` routine.
@jmpesp jmpesp requested review from leftwo and mkeeter October 15, 2025 19:15
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are really just around looking for and checking around the #1788 all zeros situation, right? We might want to document somewhere how to do a setup where someone could run this, so that information is not lost to time.

#[clap(short, long)]
region_id: Uuid,

/// Allow reading extents from a read-write downstairs that may not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fraught with peril. Can we expect this data to be good?

Or, is this more allowing us to connect to a downstairs that is serving a RW region, but that downstairs not actually taking live IO from an upstairs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can't expect it to be good no, if it's accepting IO - but in cases where it's RW downstairs that's not currently doing anything, we should expect the data to be good.

.build()
.unwrap();

loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will just run forever, or until you control-C it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

@leftwo leftwo self-requested a review October 16, 2025 00:45
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF you add some notes or documentation (somewhere??) that can be used to run these, then I'm fine with the changes going back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants