-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: ✨ Add MachineDrainRule "WaitCompleted" #11545
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
631a543
to
bab1c1b
Compare
Signed-off-by: Vince Prignano <[email protected]>
bab1c1b
to
211839e
Compare
// MachineDrainRuleDrainBehavior defines the drain behavior. Can be either "Drain" or "Skip". | ||
// +kubebuilder:validation:Enum=Drain;Skip | ||
// MachineDrainRuleDrainBehavior defines the drain behavior. Can be either "Drain", "Skip", or "WaitCompleted". | ||
// +kubebuilder:validation:Enum=Drain;Skip;WaitCompleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if the proposal should be amended accordingly
@@ -22,13 +22,13 @@ import ( | |||
|
|||
const ( | |||
// PodDrainLabel is the label that can be set on Pods in workload clusters to ensure a Pod is not drained. | |||
// The only valid value is "skip". | |||
// The only valid value is "skip" or "waitcompleted". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The only valid value is "skip" or "waitcompleted". | |
// The only valid values are "skip" and "waitcompleted". |
@@ -498,6 +507,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} | |||
if len(r.PodsToWaitCompleted) > 0 { | |||
conditionMessage = fmt.Sprintf("%s\nWaiting for the following Pods to complete without drain: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conditionMessage = fmt.Sprintf("%s\nWaiting for the following Pods to complete without drain: %s", | |
conditionMessage = fmt.Sprintf("%s\nWaiting for the following Pods to complete without eviction: %s", |
Maybe more precise
@@ -275,6 +286,11 @@ func (d *Helper) drainLabelFilter(ctx context.Context, pod *corev1.Pod) PodDelet | |||
log.V(4).Info(fmt.Sprintf("Skip evicting Pod, because Pod has %s label", clusterv1.PodDrainLabel)) | |||
return MakePodDeleteStatusSkip() | |||
} | |||
if labelValue, found := pod.ObjectMeta.Labels[clusterv1.PodDrainLabel]; found && strings.EqualFold(labelValue, string(clusterv1.MachineDrainRuleDrainBehaviorWaitCompleted)) { | |||
log := ctrl.LoggerFrom(ctx, "Pod", klog.KObj(pod)) | |||
log.V(4).Info(fmt.Sprintf("Skip evicting Pod, because Pod has %s label", clusterv1.PodDrainLabel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe start mentioning the label value (here and in l.286)
@@ -300,6 +316,8 @@ func (d *Helper) machineDrainRulesFilter(machineDrainRules []*clusterv1.MachineD | |||
log := ctrl.LoggerFrom(ctx, "Pod", klog.KObj(pod)) | |||
log.V(4).Info(fmt.Sprintf("Skip evicting Pod, because MachineDrainRule %s with behavior %s applies to the Pod", mdr.Name, clusterv1.MachineDrainRuleDrainBehaviorSkip)) | |||
return MakePodDeleteStatusSkip() | |||
case clusterv1.MachineDrainRuleDrainBehaviorWaitCompleted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the same log as in l.316+317
@@ -281,6 +281,7 @@ func (d *Helper) EvictPods(ctx context.Context, podDeleteList *PodDeleteList) Ev | |||
var podsToTriggerEvictionLater []PodDelete | |||
var podsWithDeletionTimestamp []PodDelete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the existing test coverage for the modified funcs and extend accordingly for the new case (we should have test coverage for everything, only new cases should be needed)
@@ -498,6 +507,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} | |||
if len(r.PodsToWaitCompleted) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Is this something that should be specifically handled on higher levels when the condition bubbles up? (like we handled PDB's etc.)
(could also be a follow-up PR)
What this PR does / why we need it:
This PR adds the ability for drain to wait for the completion of specific pods. This is useful in scenario where drain is either handled outside the context of
kubectl drain
after a Node is cordoned, or for long running batch Jobs that should be allowed to terminate on their own.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #