-
Notifications
You must be signed in to change notification settings - Fork 26
Add two repair tests to agent-antagonist #1793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Discussing possible causes for the zeroes appearing at the beginning of an extent file (oxidecomputer#1788), one theory that came up was that the Extent was being repaired, and those zeroes came from reading the Extent from the repair API, not from the disk itself: only _one_ Region's Extents were bad, not all of them. In order to test this theory, add two new tests, both of which will contact an Agent to get a Region's details, then use the repair API to read Extent data files and either look for a block of zeroes at the beginning or write that to a temporary file and use the new `Extent::validate` routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are really just around looking for and checking around the #1788 all zeros situation, right? We might want to document somewhere how to do a setup where someone could run this, so that information is not lost to time.
#[clap(short, long)] | ||
region_id: Uuid, | ||
|
||
/// Allow reading extents from a read-write downstairs that may not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fraught with peril. Can we expect this data to be good?
Or, is this more allowing us to connect to a downstairs that is serving a RW region, but that downstairs not actually taking live IO from an upstairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we can't expect it to be good no, if it's accepting IO - but in cases where it's RW downstairs that's not currently doing anything, we should expect the data to be good.
.build() | ||
.unwrap(); | ||
|
||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just run forever, or until you control-C it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IF you add some notes or documentation (somewhere??) that can be used to run these, then I'm fine with the changes going back.
Discussing possible causes for the zeroes appearing at the beginning of an extent file (#1788), one theory that came up was that the Extent was being repaired, and those zeroes came from reading the Extent from the repair API, not from the disk itself: only one Region's Extents were bad, not all of them.
In order to test this theory, add two new tests, both of which will contact an Agent to get a Region's details, then use the repair API to read Extent data files and either look for a block of zeroes at the beginning or write that to a temporary file and use the new
Extent::validate
routine.