-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ValidationPolicy & ValidationEvent #488
feat: Add ValidationPolicy & ValidationEvent #488
Conversation
pkg/object/validator/policy.go
Outdated
package validator | ||
|
||
//go:generate stringer -type=ValidationPolicy | ||
type ValidationPolicy int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems like SkipInvalid
is the approach that best fits our overall approach to errors here. It essentially follows the "continue-on-error" direction, but also avoids applying resources that we know there are problems with. I think there will be some issues that will prevent us from continuing (name/namespace errors comes to mind. Also, circular dependencies isn't currently caught by the validation and it is not clear how we would skip resources in that situation), but I think moving towards making it possible to continue on as many problems as possible makes sense.
Could we simplify by making this the behavior until we have evidence that something else is needed?
I'm not convinced there is much need for the ApplyAll
policy here, as it seems to suggest we'll be applying resources that we know have problems and effectively ignoring any annotations that can't be parsed.
I think ExitEarly
do have some good use-cases, but I'm uncertain if this is the right way to do it. For most of the validation, we want to let users shift it left and detect these issues in their CI/CD pipelines rather than when they are ready to apply. So it seems like we want this to be available outside of a pre-step to actually applying it. Also, we already provide a dry-run flag, which should detect these issues as well as many additional validation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems like SkipInvalid is the approach that best fits our overall approach to errors here.
For now, yes. Tho in the future I'd like to add change groups so we can continue on error by default, but skip whole change groups if any of their objects are invalid. With that behavior, if you don't specify any change groups, all resources are considered part of the same change group, meaning that the default behavior would be equivalent to ExitEarly. Then if you want SkipInvalid behavior you just have to define multiple independent change groups. Or you could opt into SkipInvalid for all objects by default, equivalent to giving every object its own change group.
I think there will be some issues that will prevent us from continuing
Yes. It will likely error in multiple places. I'll need to add tests for invalid input and make sure it handles them as desired.
Also, circular dependencies isn't currently caught by the validation and it is not clear how we would skip resources in that situation.
Agreed. It's not obvious what the user's desired behavior would be in the case of circular dependencies either. I think for now we can just throw an error and exit and figure out how to skip it later, if needed.
Could we simplify by making [SkipInvalid] the behavior until we have evidence that something else is needed?
@seans3 suggested using SkipInvalid as the default behavior. I could be convinced to do it that way, but I explained above why I think ExitEarly is a more forwards compatible default.
IME building continuous deployment pipelines, I have always in the past preferred exiting early and stopping deployment if there's any problem that needs human input. Validation errors are almost always user mistakes, and partial deployment can easily cause production outage. So if given the option as a user, I would choose ExitEarly behavior.
That said, I know some users have asked for continue-on-error, and especially after you've already started applying, it's not obvious if continuing or stopping would cause more brokenness. But I suspect that if we presented these same users with the change group option I suggested, they would prefer that over just continue-on-error behavior.
I think ExitEarly do have some good use-cases, but I'm uncertain if this is the right way to do it.
This is the current behavior. Changing it would violate the principle of least surprise for users. Plus, Config Sync already does it's own validation earlier and would exit early anyway. It just doesn't validate all the same things, like the dependency annotations.
For most of the validation, we want to let users shift it left and detect these issues in their CI/CD pipelines rather than when they are ready to apply. So it seems like we want this to be available outside of a pre-step to actually applying it. Also, we already provide a dry-run flag, which should detect these issues as well as many additional validation errors.
IME, dry-run is only really useful in a few cases:
- When applying manually and you want to be see a diff of the changes to make sure hydration didn't have any unexpected side effects.
- When there are multiple clients modifying the same files and you want to make sure you're not reverting someone else's changes accidentally.
- When applying with automation and you want a record of the diff to be stored for debugging later or the diff needs manual approval.
There's not really anything to run between dry-run and apply that makes it useful to do a dry-run in CI/CD otherwise, is there? And in all cases, you'd still want to validate again in the apply, in case they didn't run dry-run, right?
In any case, this validation code is the same code being used for both the apply and the dry-run, and I'm pretty sure we want the options and defaults to be the same for both, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed ApplyAll for now. It's not actually possible to apply resources without a kind or name. So the feature wouldn't make much sense.
I've left ExitEarly as the default, because that's the current behavior. We can change this later, if we want.
This PR primarily adds SkipInvalid and makes it opt-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this approach to validation is misguided for a couple reasons:
- We should emphasize
--dry-run
as the primary validation mechanism. Instead of attempting to figure out problems ahead of time,--dry-run
runs as much of the entire apply algorithm as possible, surfacing errors throughout the algorithm. This type of validation will almost always be better than pre-run client-side validation. It looks like the validations added here fordepends-on
andapply-time mutation
are already implemented as errors in--dry-run
. I do not see much utility for these validations other than better error messages. - Exiting early will be especially problematic, since this makes significant assumptions about what the library client wants and differs dramatically from the underlying
kubectl apply
which continues applying when encountering an apply error. Exiting early will probably require manually calling the final inventory set task to ensure the prune/inventory algorithm continues to work. - If a library clients want stronger validation (or domain-specific validation), they can always implement it themselves before calling the library apply. This library has to work for all clients, and exiting early makes strong assumptions.
I believe the goal of client-side validation should be much more modest, such as generating a more informative error messages. If the history of Kubernetes CLI's is any guide, over-ambitious client-side validation will be jettisoned eventually for dry-run and server-side validation. But as long as there is a validation option other than Exit Early
, I will not object to this PR.
@seans3 The current existing behavior already exits early if the Validator errors. I don't understand why you think that's misguided. The code is littered with hundreds of places that assume that minimal validation has been done. It's not going to be a small change to fix that to skip invalid objects (which seems to be what you want). What this PR primarily does is adds the ability to skip those invalid objects. I also added more explicit depends-on & apply-time-mutation errors, but that's because these are client-side annotations and the server won't validate them. So if they're invalid and the applier continues without skipping them (the current behavior), then that's behaviorally incorrect. We should never completely ignore invalid depends-on & apply-time-mutation errors. We should either skip those invalid objects or exit early. Either way, the user needs to be notified. That's why I added the ValidationError, to make sure the user always gets notified, regardless of whether they chose SkipInvalid or ExitEarly. Now that I've said that tho, I think the ApplyAll policy is probably only useful when not using depends-on or apply-time-mutation. Otherwise the behavior when they're invalid would be obviously incorrect. ApplyAll would basically be opting out of client-side validation and ignoring errors in client-side features. That's basically what you're championing when you're saying the applier should be more like kubectl and do less validation. The big difference is that kubectl doesn't have any client-side-only features like depends-on or apply-time-mutation. |
I've made significant changes to the handling of dependency validation. Please re-review.
I still have more tests to add. |
63c2cfc
to
da3f7de
Compare
e80d073
to
11a4451
Compare
0536309
to
b713a41
Compare
6bd43f3
to
bd28757
Compare
/hold |
bd28757
to
e053421
Compare
/unhold All TODOs finished. Ready for another review pass. |
e053421
to
430b203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. I see that we have had some discussions on this PR about when validation should happen, but the primary feature here is that we can continue applying resources in many more situations, which I think is a direction we all agree on. Allowing early exit seems reasonable to support, not least because that is the current behavior.
/approve
I'll remove the approval just to make sure @seans3 also has a chance to look at it before it merges. |
e4857ed
to
fae20ab
Compare
- Add ValidationPolicy: ExitEarly (default) - exit before apply/prune if any objects are invalid SkipInvalid - apply/prune valid objects & skip invalid ones - Add ValidationEvent to be sent once for every invalid object or set of objects, when the SkipInvalid policy is selected. For ExitEarly, return an error for reverse compatibility. - Add validation.Collector to simplify aggregating validation errors from multiple sources and extracting invalid object IDs. - Add invalid objects to the TestContext so they can be retained in the inventory (only if already present). This primarily applies to invalid annotations and dependencies. Objects without name or kind should never be added to the inventory. - Update Solver to use validation.Collector and filter invalid objects. - Add e2e test for invalid objects. - Update Printers to handle ValidationEvent - Add ExternalDependencyError & InvalidAnnotationError to make it easier to handle and introspect validation errors.
fae20ab
to
9acdbce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe exiting early should be the default behavior. Even if this behavior exists now, it is incorrect and should be fixed to continue on error. Exiting early is also not consistent with the kubectl apply
behavior which continues on error.
It's likely that ConfigSync and kpt will both initially use ExitEarly when adopting this new code. They may later switch to either hard coding SkipInvalid OR piping the option through to the user. Does that affect your default suggestion @seans3 ? |
FWIW, I do not consider ExitEarly to a be a bug. Its a feature which arises from the desire to group related changes. Of course, Kubernetes doesn't have atomic applies or groups, but you can at least avoid applying a Deployment if its Secret, ServiceAccount, or ConfigMap is invalid. This behavior is highly desired when you take the time to granularly group your objects. On the other hand, SkipInvalid is attractive when you haven't or don't intend to group your objects. This comes up regularly when multiple teams share a single config repo or delegate config management to a central SRE/DevOps/Ops team. I'm ok with either being default. I don't think it really matters. There are pros and cons on both sides.
I'm biasing slightly towards reverse compatibility here, but if anyone else wants to chime in with a 3rd vote I don't mind making the change. |
I would probably prefer that we keep the current behavior here (so ExitEarly is the default), but I don't feel strongly about it. |
For Config Sync users, the default should be ExitEarly, which is closer to the existing validation behavior in Config Sync. Supporting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: karlkfi, seans3 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ExitEarly (default) - exit before apply/prune if any objects are invalid
SkipInvalid - apply/prune valid objects & skip invalid ones
objects, when the SkipInvalid policy is selected. For ExitEarly,
return an error for reverse compatibility.
from multiple sources and extracting invalid object IDs.
the inventory (only if already present). This primarily applies to
invalid annotations and dependencies. Objects without name or kind
should never be added to the inventory.
to handle and introspect validation errors.
fixes: #484