Skip to content

Collapse the INVALID_ARGUMENTS error rows and clarify #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bswartz
Copy link
Contributor

@bswartz bswartz commented Jul 24, 2025

What this PR does / why we need it:
This clarifies that the INVALID_ARGUMENT error code for ControllerModifyVolume is safe to not retry (i.e. "infeasible")

Does this PR introduce an API-breaking change?:

none

For ControllerModifyVolume, clarify that INVALID_ARGUMENT is an
infeasible error and that the SP has not made any changes, so
the CO does not need to retry.
@gnufied
Copy link
Contributor

gnufied commented Jul 24, 2025

lgtm

@huww98
Copy link
Contributor

huww98 commented Jul 25, 2025

@gnufied
Copy link
Contributor

gnufied commented Jul 25, 2025

This could be a breaking change to SP. e.g. AWS does not comply with this now.

In CSI this feature is alpha, this is why it is important we tighten the wording before this feature goes GA in CSI.

@huww98
Copy link
Contributor

huww98 commented Jul 25, 2025

I'm afraid this may be too much to ask for SP. In the previous AWS case, whether the parameter is invalid is determined by server, and cannot be determined solely in CSI. If combined with multi-stage modification, we may not able to verify all parameters of all stage before making any changes. I don't have such specific use-case though, but I expect we will have such case in the future.

So if this change is applied, we will not able to return "invalid argument" for the second stage. We could replace them with Internal error. But IMO this is not a good design, because SP could easily implementing this wrong.

@gnufied
Copy link
Contributor

gnufied commented Jul 25, 2025

So if this change is applied, we will not able to return "invalid argument" for the second stage. We could replace them with Internal error. But IMO this is not a good design, because SP could easily implementing this wrong.

SP can still return Invalid Argument but they must undo work they did before returning it. Isn't this how transactional changes work? If SP discovers half way through that one or more parameters is invalid, they shouldn't just leave volume in half modified state and return Invalid Argument, that makes it pretty hard for user to reason about state of the volume.

@bswartz
Copy link
Contributor Author

bswartz commented Jul 25, 2025

I'm afraid this may be too much to ask for SP. In the previous AWS case, whether the parameter is invalid is determined by server, and cannot be determined solely in CSI. If combined with multi-stage modification, we may not able to verify all parameters of all stage before making any changes. I don't have such specific use-case though, but I expect we will have such case in the future.

So if this change is applied, we will not able to return "invalid argument" for the second stage. We could replace them with Internal error. But IMO this is not a good design, because SP could easily implementing this wrong.

I don't think it's a large burden on the SP to force it to validate ALL of the parameters up front. Obviously this requires a round-trip to the actual server, but I can't think of a reason that the server couldn't validate all of the parameters as a unit. If anything is found to be invalid, it should be reported before any changes are made.

Once any changes are made, the SP either needs to reverse them, or return some error other than INVALID_ARGUMENT to indicate that it got stuck and retries are needed to move forward. It's unreasonable to expect COs to deal with a situation where the user attempted to do something invalid, and the SP put the volume into an inconsistent state as a result, and the CO can't even detect whether it should retry or give up.

Copy link
Contributor

@xing-yang xing-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@huww98
Copy link
Contributor

huww98 commented Jul 25, 2025

Take AWS EBS as an example. It needs to modify both disk and tag. Obviously parameters to both stage can contains invalid values.

The current implementation is modify tag first. After that succeeded, modify the disk. So if the CSI restarts after tag has been modified, but before disk. Then it find invalid argument when modifying disk in the new CSI process. How can it rollback the changes to tags? It does not know the tag value before modification.

Maybe AWS guys can swap the order, modify the disk first, then tag. In that case, if the tags contains invalid argument, CSI even cannot rollback the changes, because we need to wait for 6 hours to modify the disk again. Maybe CSI can validate the tags locally before doing anything. But we cannot be 100% sure the local validation logic is in sync with server.

Obviously this requires a round-trip to the actual server, but I can't think of a reason that the server couldn't validate all of the parameters as a unit.

Not one round-trip, but multiple. 3 for AWS, 4 for Alibaba Cloud currently. Server only receives some of the parameters in any of the round-trip.

Isn't this how transactional changes work?

That is ideal. But do you expect CSI to implement an undo log system for this, as database usually do? It is way too complex. And CSI usually does not have access to persistent storage itself. It is generally very hard to make a workflow in complex distributed system transactional. The best we can promise is idempotency, so that the workflow can be retried.

@bswartz
Copy link
Contributor Author

bswartz commented Jul 25, 2025

I'm not saying that error handling needs to be perfect. It's clearly difficult to reliably implement a multi-stage change. What I'm saying is that the SP can only use the ILLEGAL_ARGUMENT return code when it's safe for the CO to abandon the operation and no more retries are needed to get the volume into a good state.

SPs should endeavor to do as much error checking up front as possible. Once the error checking has passed, and the parameters are deemed valid, then the SP should start mutating things. At that point, any failures should return some other error code (probably INTERNAL) if things go wrong. It's too late to inform the CO that a parameter is invalid once the SP starts making changes. In this case the CO will continue retrying so the SP can recover from various faults and continue to make forward progress.

There is NOT an expectation that the SP can roll back from a failure. The general expectation is to roll forward using idempotent retries until success can be achieved. If the SP does have a way to roll backward and undo a change, that is also acceptable, but in practice very few implementations do that.

If the SP cannot do sufficient error checking before starting to make changes, that should be viewed as a flaw in the underlying system, and the underlying system should offer a verification mechanism for proposed parameters. I would recommend against supporting any parameter in a VAC for which up-front verification can't be done.

spec.md Outdated
@@ -1711,8 +1711,7 @@ message ControllerModifyVolumeResponse {

| Condition | gRPC Code | Description | Recovery Behavior |
|-----------|-----------|-------------|-------------------|
| Parameters not supported | 3 INVALID_ARGUMENT | Indicates that the CO has specified mutable parameters not supported by the volume. | Caller MAY verify mutable parameters. |
| Exceeds capabilities | 3 INVALID_ARGUMENT | Indicates that the CO has specified capabilities not supported by the volume. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. |
| Parameters not supported | 3 INVALID_ARGUMENT | Indicates that the CO has specified invalid mutable parameter keys or values, or the specified volume cannot support the specified parameters. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. Indicates that no changes were made by the SP. |
Copy link
Contributor

@huww98 huww98 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that if we accept this PR, we could replace INVALID_ARGUMENT in the half-way with INTERNAL. It is viable. But do note that INVALID_ARGUMENT also have the special meaning that external-resizer will retry at lower frequency. We will loss that if we use INTERNAL.

Suggested change
| Parameters not supported | 3 INVALID_ARGUMENT | Indicates that the CO has specified invalid mutable parameter keys or values, or the specified volume cannot support the specified parameters. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. Indicates that no changes were made by the SP. |
| Parameters not supported | 3 INVALID_ARGUMENT | Indicates that the CO has specified invalid mutable parameter keys or values, or the specified volume cannot support the specified parameters. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. Indicates that no changes were made by the SP in this call and any previous calls with the same arguments. |

I think we should add this to make it safe for CO to abandon the operation after many retries.

Even if we add the words suggested above. It is still not perfectly safe to abandon the operation. For example as I mentioned in the slack thread:

  1. Modify from A to B: partially applied, INTERNAL is returned by SP
  2. Modify again to C: INVALID_ARGUMENT is returned by SP

Now, although the SP is completely complying to the CSI spec suggested here, we cannot safely abandon the operation because the volume is in half A half B state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "slow retry" behavior is not required by CSI, it's just what Kubernetes decides to do when the error is infeasible, because Kubernetes is a declarative system.

If you return INTERNAL, you will get a normal (faster) retry, which is presumably desirable if the operation was merely interrupted and can proceed with another attempt.

For situations where the SP is truly stuck and there was no way to detect that it would get stuck in advance, the only remedy is feedback to the user/operator and someone making a change to the request, or going to the underlying system and fixing some underlying problem that's invisible to the CO/SP.

In the case of VAC, the possible fixes are:

  • The user changes the PVC's VAC back to the old one, or some other one and expects it to reconcile successfully
  • An admin creates a new VAC with correct parameters (fixing whatever the problem was with the given VAC) and sets the PVC to that one, and expects it to reconcile successfully
  • An operator goes straight to the underlying storage system and fixes whatever is preventing the original mutation

Copy link
Contributor

@sunnylovestiramisu sunnylovestiramisu Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the CSI spec also add the points to suggest the SP what are the suggested paths to recover from these errors. Like what Ben listed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CSI spec we try to be very hands-off on how the SPs achieve their goals. All we care about is the contract between them and the CO. This is frustrating to SP authors, but it's genuinely challenging to write a spec that is both as flexible as CSI and also reliable as the Kubernetes-side implementation of the spec.

Consider that this feature was originally envisioned as a way to set QoS policies on volumes and change them at a later time. I think it's pretty obvious how one would validate QoS policies and then mutate them idempotently. Other features that follow the same pattern are a good fit for this feature.

If there are use cases where the simple QoS policy pattern doesn't fit, we need to discuss those concrete use cases, otherwise, we assume that SPs can do whatever they want and they own all the problems. The specific issue here is that the CO has a problem and the only way to resolve it is to be more prescriptive in the spec, which would be a breaking change if it were not alpha, so we're trying to do it now.

Copy link
Contributor

@sunnylovestiramisu sunnylovestiramisu Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how about we use some stronger wording:

Indicates that one or more arguments in the request is invalid. The SP MUST NOT 
have applied any modification to the volume as part of this specific call. A caller 
may verify volume capabilities by calling ValidateVolumeCapabilities and then retry 
the request with valid arguments.

And about the case @huww98:

  1. Modify from A to B: partially applied, INTERNAL is returned by SP
  2. Modify again to C: INVALID_ARGUMENT is returned by SP

The initial non-atomic failure has put the volume into an undefined and potentially unrecoverable state. When the modification from A to B failed and was only partially applied, the SP violated the spec's requirement for atomicity. The volume is now in a state (A') that was never intended to exist. This A' state is effectively "corrupt" from the perspective of state machine. It doesn't match the last known successful configuration (A), nor does it match the target configuration (B).

For these cases I think the inescapable conclusion is "Manual Intervention is Required". Do we want to write it into the Spec though? Beause Specs should focus on defining correct behavior and prevention, not cure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that if we accept this PR, we could replace INVALID_ARGUMENT in the half-way with INTERNAL. It is viable. But do note that INVALID_ARGUMENT also have the special meaning that external-resizer will retry at lower frequency. We will loss that if we use INTERNAL.

I think we should add this to make it safe for CO to abandon the operation after many retries.

Even if we add the words suggested above. It is still not perfectly safe to abandon the operation. For example as I mentioned in the slack thread:

1. Modify from A to B: partially applied, INTERNAL is returned by SP

2. Modify again to C: INVALID_ARGUMENT is returned by SP

Now, although the SP is completely complying to the CSI spec suggested here, we cannot safely abandon the operation because the volume is in half A half B state.

If some some reason A -> B passes validation, but still fails for unexpected reasons (i.e. INTERNAL error), my understanding is that the CO will continue retrying the modification to B, but if the user changes the VAC to C, then the CO will accept that if C contains all the mutable parameter keys that B did. If C doesn't contain all the keys that B did, it will keep trying to modify to B (this is for safety within the quota system).

If the SP rejects the modification to C with INVALID_ARGUMENT, then I suspect it gives up and goes back to trying to modify to B, because the CO interprets that error as if the VAC itself is invalid.

The point is that INVALID_ARGUMENT is a signal that the VAC itself is not usable, at least for that volume, and INTERNAL is a signal that the VAC was fine, and the underlying system has some problem, in which the normal kubernetes approach is to keep trying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are saying, if there are two parts, you have to validate both parts before applying part 1. We're discouraging SPs doing a sequence of validate, apply, validate, apply, etc.

I understand this. I'm trying to say, CSI spec needs to be precise, we need to formalize this into the spec. Just discouraging is not enough IMO.

the CO will accept that if C contains all the mutable parameter keys that B did

This is a solution, but is not a good one IMO. CSI spec said "COs will treat these (mutable_parameters) as opaque", which I think conflicts with this rule. In practice, consider this case:

  1. User modify to VAC A with {"type": "cloud_essd", "performanceLevel": "PL3"} and failed with INTERNAL error (out of stock, for example).
  2. User then try to recover by modify again to VAC B with {"type": "cloud_auto"}, where cloud_auto does not have performanceLevel parameter.

With your plan, CO should not accept B. But this is valid and safe from SP's PoV.

If the SP rejects the modification to C with INVALID_ARGUMENT, then I suspect it gives up and goes back to trying to modify to B, because the CO interprets that error as if the VAC itself is invalid.

This can be viable. But I suspect it can be hard for CO to implement this. Consider this:

  1. User modify to VAC A, failed with INTERNAL, because the credential used to communicate with storage backend is not correctly configured.
  2. User then modify to VAC B, C, D then E, in sequence, all failed with INTERNAL.
  3. Cluster admin correct the credential.
  4. CO retries E, it failed with INVALID_ARGUMENT.
  5. CO give up E then retries D, it also failed with INVALID_ARGUMENT.
  6. CO retries C, B, A in sequence, all failed with INVALID_ARGUMENT.
  7. now CO is safe to abandon this operation.

As you can see, in order to implement this, CO will need to persistent a potentially unbounded stack of VACs it has tried. We don't have a place to store this in Kubernetes PVC.status now.

Copy link
Contributor Author

@bswartz bswartz Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this. I'm trying to say, CSI spec needs to be precise, we need to formalize this into the spec. Just discouraging is not enough IMO.

We are being precise here in the matter which the CSI is able to be. We have limited control over what SPs and COs really do. The power of the spec is to specify the contract, and I feel that saying the SP must not have modified the volume if it returns INVALID_ARGUMENT is as good as we can do.

This is a solution, but is not a good one IMO. CSI spec said "COs will treat these (mutable_parameters) as opaque", which I think conflicts with this rule. In practice, consider this case:

I agree with you that this CO behavior violates the spirit of opaqueness in the spec, but I'm expressing what the CO already does, and how its trying to cope with a specific error recovery problem.

This can be viable. But I suspect it can be hard for CO to implement this. Consider this:

1. User modify to VAC A, failed with INTERNAL, because the credential used to communicate with storage backend is not correctly configured.

2. User then modify to VAC B, C, D then E, in sequence, all failed with INTERNAL.

3. Cluster admin correct the credential.

4. CO retries E, it failed with INVALID_ARGUMENT.

5. CO give up E then retries D, it also failed with INVALID_ARGUMENT.

6. CO retries C, B, A in sequence, all failed with INVALID_ARGUMENT.

7. now CO is safe to abandon this operation.

As you can see, in order to implement this, CO will need to persistent a potentially unbounded stack of VACs it has tried. We don't have a place to store this in Kubernetes PVC.status now.

@gnufied should respond to this hypothetical. My intuition is that it would be bad SP behavior to respond with INTERNAL first, and then INVALID_ARGUMENT later to the same volume/mutable parameters pair, but I can imagine how it could happen without technically violating the spec, and it does seem to suggest that the CO's current recovery scheme is not foolproof. Assuming the spec wording doesn't change and we can't create better CO behavior, then my advice to SP authors would be to not every do the above behavior -- make sure to return INVALID_ARGUMENT either immediately or never.

Copy link
Contributor

@gnufied gnufied Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently CO (k8s) doesn't look into mutable_parameters at all. I was proposing that as a theoretical option in case, if we were to guard against partial application of VACs (like a bulletproof option at the cost of flexibility).

Basically at this point we concede that, recovering to a mutable_parameter that is a subset of proposed transition is lossy and it it is possible that volume will be left in a state where one or more of parameters were applied. This is why, we don't even call recover in CO, we are proposing this merely as a way to cancel ongoing ControllerModifyVolume OP

The only way CO can solve this is by preventing recovery to subset of mutable_parameters

  1. User modify to VAC A, failed with INTERNAL, because the credential used to communicate with storage backend is not correctly configured.
  2. User then modify to VAC B, C, D then E, in sequence, all failed with INTERNAL.
  3. Cluster admin correct the credential.
  4. CO retries E, it failed with INVALID_ARGUMENT.
  5. CO give up E then retries D, it also failed with INVALID_ARGUMENT.

CO will not automatically retry recovery to any of the failed states, unless user takes some action to recover. I am not sure where is this coming from.

My intuition is that it would be bad SP behavior to respond with INTERNAL first, and then INVALID_ARGUMENT later to the same volume/mutable parameters pair,

In general not great, but CO does tolerate this.

Copy link
Contributor

@huww98 huww98 Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically at this point we concede that, recovering to a mutable_parameter that is a subset of proposed transition is lossy and it it is possible that volume will be left in a state where one or more of parameters were applied. The only way CO can solve this is by preventing recovery to subset of mutable_parameters

We can also ask cluster admin to take responsibility of this. We can write this in the doc: if you want your quota to work as expected, you should fully specify the volume state in each VAC. This is required even if all modify is successful. e.g. If VAC A does not specify IOPS, but VAC B specify a very high IOPS, user can just switch to B then back to A to get free IOPS.

CO will not automatically retry recovery to any of the failed states, unless user takes some action to recover. I am not sure where is this coming from.

@bswartz proposed "I suspect it gives up and goes back to trying to modify to B", and I'm trying to generalize that "goes back" to "recursively goes back" when there are even more VACs involved.

My intuition is that it would be bad SP behavior to respond with INTERNAL first, and then INVALID_ARGUMENT later to the same volume/mutable parameters pair

@bswartz for example, if SP got "network connection timeout" while trying to verify the arguments, I think the best it can do is returning INTERNAL.

@huww98
Copy link
Contributor

huww98 commented Jul 25, 2025

But why we need to abandon the operation in the first place? If modify from A to B failed, and user request A to CO again, CO then request A again to SP, it is simple and easy. And we don't need to this PR, or asking AWS to make the change.

In "recover from expansion failure" feature, we also never abandon an expansion operation. We only request for smaller size. And only assume the volume is in good state when we got a success ControllerExpandVolume. Can we also do the same for ControllerModifyVolume?

@bswartz
Copy link
Contributor Author

bswartz commented Jul 25, 2025

But why we need to abandon the operation in the first place? If modify from A to B failed, and user request A to CO again, CO then request A again to SP, it is simple and easy. And we don't need to this PR, or asking AWS to make the change.

This is a decision for the SP to make. The purpose of the CSI spec is to tell the SP and CO how to behave in response to errors. We are trying to make it clear that if the SP wants the CO to give up, it should return INVALID_ARGUMENT, because it means that there is no more work to do. If the SP wants the CO to retry, it needs to return a non-terminal error like INTERNAL so another retry will be made.

We can't have a situation where INVALID_ARGUMENT is ambiguous to the CO, because the CO is going to implement exactly one response to that error code. What we prefer is to treat that error as an infeasible error, and in order for that to be safe, then we have to specify that it's not okay to return it after some changes have already been made, and those changes could not be rolled back. We will expect SPs to comply with this behavior, or else suffer from bad behavior, such as slow retries or possibly never retries.

In "recover from expansion failure" feature, we also never abandon an expansion operation. We only request for smaller size. And only assume the volume is in good state when we got a success ControllerExpandVolume. Can we also do the same for ControllerModifyVolume?

It's better not to think of a VAC change as a single operation. Think about the actual state and desired state, and one more reconciliation operations to transform from actual to desired. Each call to ControllerModifyVolume is an attempt at reconciliation. CSI RPCs get interrupted, cancelled, timed out, etc. all the time. The RPC must be idempotent and must continue to make progress forward or backward once it has started. In the case of expand, it's valid to change the desired state to a different value before the actual state is updated. I think the same should be true of VAC changes. If the current implementation of the VAC reconciler can't do that then we need to tackle that problem separately.

@bswartz
Copy link
Contributor Author

bswartz commented Jul 25, 2025

For context: kubernetes-csi/external-resizer#514

@bswartz
Copy link
Contributor Author

bswartz commented Jul 25, 2025

Okay @gnufied explained to me the point of the target VAC and the need for rollback support, because of quotas and concerns about exploitation of the quota system in case of failures. Based on that understanding I still think this CSI spec change makes sense, and non-conforming drivers will need to update.

@huww98
Copy link
Contributor

huww98 commented Jul 26, 2025

it's valid to change the desired state to a different value before the actual state is updated.

Yes, it is totally valid. I want CO to never abandon an operation without notifying SP. I want the CO to think the actual state is unknown when it starts the first ControllerModifyVolume. So if modify from A to B failed for any reason, the actual state is neither A nor B, CO should call ControllerModifyVolume again to either go forward or backward. I've implemented this in kubernetes-csi/external-resizer#513

But if we really want to proceed this PR, we should be careful about cases mentioned in #597 (comment) and #597 (comment)

Copy link

@AndrewSirenko AndrewSirenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

The EBS CSI Driver returning infeasible despite modifying tags is a bug that we put on our backlog to fix.

Copy link

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this, spec change lgtm.

@sunnylovestiramisu
Copy link
Contributor

/lgtm

@huww98
Copy link
Contributor

huww98 commented Jul 28, 2025

I think the spirit of this change is good. But even if it is merged, it will not resolve any issue we are facing.

If I understand correctly, this PR is intended to make it safe for CO to abandon a modify operation without notifying the SP.

Consider the following circumstances when modifying from state A, and user want to return to state A:

  1. A -> B (INVALID_ARGUMENTS)
  2. A -> B (TIMEOUT) -> B (INVALID_ARGUMENTS)
  3. A -> B (INTERNAL) -> B (INVALID_ARGUMENTS)
  4. A -> B (INTERNAL) -> C (INVALID_ARGUMENTS)
  5. A -> B (INTERNAL) -> C (INTERNAL) -> D (INTERNAL) -> E (INVALID_ARGUMENTS)

With this change, we only able to safely abandon the operation in case 1, which is the simplest. But if we do implement this feature in CO, we really want to also support case 2 and 3, because we don't want a random component restart or network error to break this feature. Case 4 and 5 are better-to-have though, CO may just not supporting canceling in this case.

So CO really cannot take any advantage of this new promise from SP. IMO we should either make stronger requirement to SP, or don't make this change and leave more freedom to SP.


While I've asked this before, it is still unanswered. Why would CO want to abandon the operation in the first place? Previously, we call ControllerUnpublishVolume to cancel a failed ControllerPublishVolume; we call ControllerExpandVolume again with different size to cancel a failed ControllerExpandVolume. All works well. Why can't we just do the same, call ControllerModifyVolume again with different parameter to cancel a failed ControllerModifyVolume? Do you want to avoid an extra gRPC call and save 100ms?

In the case of expand, it's valid to change the desired state to a different value before the actual state is updated. I think the same should be true of VAC changes.

@bswartz this is your previous reply. I'm not sure but are you supporting my opinion that we never need to abandon the operation? If CO failed to reconcile to the desired state B and user set new desired state back to A. Can we just reconcile to state A again, instead of assuming the volume is not modified from state A and abandon the operation?

@gnufied
Copy link
Contributor

gnufied commented Jul 28, 2025

While I've asked this before, it is still unanswered. Why would CO want to abandon the operation in the first place? Previously, we call ControllerUnpublishVolume to cancel a failed ControllerPublishVolume; we call ControllerExpandVolume again with different size to cancel a failed ControllerExpandVolume. All works well. Why can't we just do the same, call ControllerModifyVolume again with different parameter to cancel a failed ControllerModifyVolume? Do you want to avoid an extra gRPC call and save 100ms?

I think I have answered this on Slack, but I will try and answer again. If were to design this feature with requirement that you can only recover to another VAC and if we want this to work like "true" recovery and no partial state is left on the volume from which it is being "recovered", then we will have to design this whole feature from ground up.

  1. First thing, you can't recover to a VAC which has only a subset of parameters to which you "tried" to transition. We can't leave this to a documentation thing, the CO will have to actually inspect mutable_parameters (CSI spec and CO changes) and determine if recovery is safe.
  2. Similarly we will have to make CSI spec change to make mutable_parameters optional, but even then if user "recovers" to a nil VAC, SP has to somehow synthesize original parameters which were changed during failed modification. This will be tricky to do correctly on SP side.

Luckily volume resizing does one thing and only one thing so it doesn't have this set of challenges.

@huww98
Copy link
Contributor

huww98 commented Jul 29, 2025

First thing, you can't recover to a VAC which has only a subset of parameters to which you "tried" to transition. We can't leave this to a documentation thing, the CO will have to actually inspect mutable_parameters (CSI spec and CO changes) and determine if recovery is safe.

I think we can just leave this to documentation. This is about VAC and quota, not CSI spec. Currently, even not considering the recovering scenario, we already rely on the VAC to fully specify the volume state to make quota useful. e.g.

  • VAC A: {"bandwidth": "100Mbps"}
  • VAC B: {"bandwidth": "200Mbps", "iops": "2000"}

User can switch from A to B then back to A to get 2000 iops without using his quota on VAC B. So such issue is not specific to recovering.

Similarly we will have to make CSI spec change to make mutable_parameters optional, but even then if user "recovers" to a nil VAC, SP has to somehow synthesize original parameters which were changed during failed modification. This will be tricky to do correctly on SP side.

I admit "recover to nil" case is special, and it is better to have cancelling mechanism. But I've thought about this for several hours these days, considered multiple alternatives of wording, but cannot get a satisfying solution. I found this line in CSI spec:

The only way to cancel a call is to issue a "negation" call if one exists.

Although this is under Timeout subtitle, but this makes me think: cancelling is just not the correct way to go in CSI

So how about just not rollback to nil VAC? When user set PVC.spec.VACName to nil, we just stop reconciling for this PVC, leave it at InProgress or Infeasible state. Then user will have several options:

  • Just leave it in this state. If user does not care about quota, it will still works fine
  • Set a (different) VAC name to try again
  • cluster admin can check the volume status manually and reset PVC.status.modifyStatus to nil to fully revert the change.
  • cluster admin set PVC.status.modifyStatus.targetVACName to empty, then external-resizer will start to reconcile to empty parameters, and reset modifyStatus to nil when successful. This maybe not very useful now. But we reserve this for retrieving topology from SP in the future.

Luckily volume resizing does one thing and only one thing so it doesn't have this set of challenges.

Yes. ControllerExpandVolume only have one size parameter, so it is always fully specified. And it also returns the size after expansion, so that even if CO request smaller size when the volume is actually larger, CO will know the actual size.

@gnufied
Copy link
Contributor

gnufied commented Jul 29, 2025

As per our discussion in CSI implementation meeting. Error handling for each of the cases is now documented in - kubernetes/enhancements#5462

This change to CSI spec should be IMO merged, because while it is not perfect it is better than any other alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants