-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(expansion): add an error message to guide users in resolving the stuck volume expansion cancellation #3168
base: master
Are you sure you want to change the base?
Conversation
8173d64
to
d1f2e42
Compare
…volume expansion cancellation Longhorn 9466 Signed-off-by: Derek Su <[email protected]>
d1f2e42
to
0c05794
Compare
@@ -574,6 +568,17 @@ func (m *VolumeManager) CancelExpansion(volumeName string) (v *longhorn.Volume, | |||
engine = e | |||
} | |||
|
|||
if !v.Status.ExpansionRequired { | |||
if v.Spec.Size == engine.Spec.VolumeSize && v.Spec.Size != engine.Status.CurrentSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible?
The intermediate state that causes the bug is:
- The expansion is already started and finished in the engine process level, which means
v.Spec.Size == engine.Spec.VolumeSize
. - Longhorn has not updated engine/volume CR status. In other words,
v.Spec.Size != engine.Status.CurrentSize
andv.Status.ExpansionRequired == true
. (I am not sure ifengine.Status.IsExpanding
can be false here)
To be honest, it is hard to capture this state then reject the requests in API layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
It is caused by it as I mentioned in longhorn/longhorn#9466 (comment).
Do you think the error message (hint how to resolve the issue) is reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message itself is reasonable. But my question is, how can v.Status.ExpansionRequired
become false (or can the new condition be satisfied) before VolumeManager.CancelExpansion
execution complete?
The only place of adding the hint is probably here. And currently, Longhorn has no way to figure out if there is an cancellation before this error is trigger, the error hint can be only like:
the expected size %v of engine %v should not be smaller than the current size %v. It can be caused by a cancellation request after the expansion is complete, please re-do the expansion to resolve this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general problem for canceling any async operations relying on the resource status which is regularly updated but not in real time.
Is it possible to get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion
It is not an atomic operation hence there can be a race anyway. For now, I think adding warnings/notifications is enough.
WalkthroughThe pull request modifies the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
manager/volume.go (1)
571-581
: LGTM! Consider adding debug logs for troubleshooting.The error handling logic effectively addresses the race condition where volume expansion completes at the engine level but the status hasn't been updated. The error message provides clear guidance to users on how to resolve the stuck cancellation.
Consider adding debug logs to help with troubleshooting:
if !v.Status.ExpansionRequired { + logrus.Debugf("Volume %v expansion cancellation check: spec.Size=%v, engine.Spec.VolumeSize=%v, engine.Status.CurrentSize=%v", + v.Name, v.Spec.Size, engine.Spec.VolumeSize, engine.Status.CurrentSize) if v.Spec.Size == engine.Spec.VolumeSize && v.Spec.Size != engine.Status.CurrentSize { return nil, fmt.Errorf("volume has already been expanded to %v and cannot revert to size %v. Please manually expand it to %v to resolve the stuck cancellation", engine.Status.CurrentSize, v.Spec.Size, engine.Status.CurrentSize) } return nil, fmt.Errorf("volume expansion is not started") }
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9466
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
Bug Fixes
Improvements