-
Notifications
You must be signed in to change notification settings - Fork 3.9k
kvserver: clear RHS when learning about a split through a snapshot #148781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Previously, it was possible for us to leak the (post-split) RHS keyspans if we learned about a split through a snapshot. This leak could be long lived if the RHS replica was moved away from the store, as the store wouldn't eventually get a snapshot for the RHS which would force the keyspans to be cleared out. This hazard was shown in cockroachdb#148707. We know there's been a split if the store receives a snapshot that's narrower than the replica it was previously tracking. In the general case, when there are merges involved, a store knows that there's been a N merges followed by a split if there are N (necessarily contiguous) overlapping (with the snapshot) replicas whose keyspan is narrower than that of the snapshot. Interestingly, we were handling the general case correctly, but it required for there to be at least one merge before the split. This patch lifts that logic from clearSubsumedReplicaDiskData and moves it to a similarly named clearSplitReplicaDiskData, tying the narrowing logic more directly to splits and not confusing it with merges. Closes cockroachdb#73462 Release note: None
// NB: subsumedDescs in prepareSnapApplyInput must be sorted by start key. | ||
sort.Slice(subsumedDescs, func(i, j int) bool { | ||
return subsumedDescs[i].StartKey.Compare(subsumedDescs[j].StartKey) < 0 | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks unnecessary. This slice is already sorted, by construction here. If you like, assert slices.IsSorted
here, or in prepareSnapApply
?
The code in clearSubsumedReplicaDiskData
might have been misleading before, making an impression that this slice can be unsorted.
// that either). Something is severely wrong for this to happen, so perform | ||
// a sanity check. | ||
if subsumedDescs[0].StartKey.Compare(desc.StartKey) < 0 { // subsumedDescs are sorted by StartKey | ||
log.Fatalf(ctx, "subsuming replica to our left; key span: %v; total key span %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log line no longer prints key spans. Update the wording?
snapDesc *roachpb.RangeDescriptor, | ||
rightMostDesc *roachpb.RangeDescriptor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of these 2 descriptors, the only information we need/use are the two EndKey
s that form a key span that we want to clear. So we can instead just take this span, and make this func straightforward:
for _, span := range rditer.Select(... {
RSpan: <this "delta" RSpan>
}) {
_ = span // generate SST
}
I started doing this in https://github.com/cockroachdb/cockroach/pull/147378/files#diff-dd96d448893ec224dc7a1ca0e9056a6ec9d8aa830b80e1b6a44e2e3a8eab29abR262-R266, but it fell through the cracks.
// will also clear all subsumed replicas. | ||
continue | ||
} | ||
// clearSplitReplicaDiskData clears the on disk data of any replica that has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested below how to make this func a lower level / straightforward primitive. Correspondingly, these comments might need to move one level above to the caller.
// current keySpan). | ||
rightMostDesc = input.subsumedDescs[len(input.subsumedDescs)-1] | ||
|
||
// In the other case, where the subsumed replicas are fully contained in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely makes sense to move the shorter/longer logic up the stack from clearSplitReplicaDiskData
, so that these comments are interleaved with the checks as they happen. I made some suggestions in clearSplitReplicaDiskData
how to achieve it.
@@ -595,16 +600,20 @@ func (r *Replica) applySnapshotRaftMuLocked( | |||
truncState: truncState, | |||
hardState: hs, | |||
desc: desc, | |||
origDesc: r.descRLocked(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tripped the mutex assertion here, since mu
is not locked.
Consider r.Desc()
. I can see at least one call to it above, so maybe reuse it to avoid extra mutex locks?
Also, how does this work when origDesc
is empty? I.e. when this snapshot just initializes our replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to lock in the first place. Desc
is in "shMu", i.e. always changed with mu
and raftMu
both locked, and readable when holding either of the mutexes.
Previously, it was possible for us to leak the (post-split) RHS keyspans if we learned about a split through a snapshot. This leak could be long lived if the RHS replica was moved away from the store, as the store wouldn't eventually get a snapshot for the RHS which would force the keyspans to be cleared out. This hazard was shown in #148707.
We know there's been a split if the store receives a snapshot that's narrower than the replica it was previously tracking. In the general case, when there are merges involved, a store knows that there's been a N merges followed by a split if there are N (necessarily contiguous) overlapping (with the snapshot) replicas whose keyspan is narrower than that of the snapshot.
Interestingly, we were handling the general case correctly, but it required for there to be at least one merge before the split. This patch lifts that logic from clearSubsumedReplicaDiskData and moves it to a similarly named clearSplitReplicaDiskData, tying the narrowing logic more directly to splits and not confusing it with merges.
Closes #73462
Release note: None