-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pod diagnostics before scaling down to zero in scaler #15326
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -114,10 +114,16 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pa *autoscalingv1alpha1. | |
if err := c.ReconcileMetric(ctx, pa, resolveScrapeTarget(ctx, pa)); err != nil { | ||
return fmt.Errorf("error reconciling Metric: %w", err) | ||
} | ||
podCounter := resourceutil.NewPodAccessor(c.podsLister, pa.Namespace, pa.Labels[serving.RevisionLabelKey]) | ||
|
||
pod, err := podCounter.GetAnyPod() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be getting a pod that isn't ready - eg. you could have min scale = 10 and the last pod can't be scheduled (due to resource constraints) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are not targeting all pods, we are targeting the scenario with the image issue. If someone wants to cover all cases he can extend the work here later. Maybe I should change the PR title, here we are adding pod diagnostics for the issue with the image only or similar issues where all pods are stuck and deployment reconciliation cannot catch it due to the known K8s limitations (progress deadline cannot catch all cases). |
||
if err != nil { | ||
return fmt.Errorf("error getting a pod for the revision: %w", err) | ||
} | ||
Comment on lines
+119
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fetching a pod here seems premature There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you elaborate? We already do that in that function via the pod accessor for getting the state a few lines bellow. We are going to test for handling the scale to zero case and check pod status, if we have to. |
||
|
||
// Get the appropriate current scale from the metric, and right size | ||
// the scaleTargetRef based on it. | ||
want, err := c.scaler.scale(ctx, pa, sks, decider.Status.DesiredScale) | ||
want, err := c.scaler.scale(ctx, pa, sks, decider.Status.DesiredScale, c.Client, pod) | ||
if err != nil { | ||
return fmt.Errorf("error scaling target: %w", err) | ||
} | ||
|
@@ -145,7 +151,6 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pa *autoscalingv1alpha1. | |
} | ||
|
||
// Compare the desired and observed resources to determine our situation. | ||
podCounter := resourceutil.NewPodAccessor(c.podsLister, pa.Namespace, pa.Labels[serving.RevisionLabelKey]) | ||
ready, notReady, pending, terminating, err := podCounter.PodCountsByState() | ||
if err != nil { | ||
return fmt.Errorf("error getting pod counts: %w", err) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,16 +35,21 @@ import ( | |
"knative.dev/serving/pkg/activator" | ||
autoscalingv1alpha1 "knative.dev/serving/pkg/apis/autoscaling/v1alpha1" | ||
"knative.dev/serving/pkg/autoscaler/config/autoscalerconfig" | ||
clientset "knative.dev/serving/pkg/client/clientset/versioned" | ||
"knative.dev/serving/pkg/reconciler/autoscaling/config" | ||
kparesources "knative.dev/serving/pkg/reconciler/autoscaling/kpa/resources" | ||
aresources "knative.dev/serving/pkg/reconciler/autoscaling/resources" | ||
revresurces "knative.dev/serving/pkg/reconciler/revision/resources" | ||
"knative.dev/serving/pkg/resources" | ||
|
||
"go.uber.org/zap" | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
"k8s.io/apimachinery/pkg/types" | ||
"k8s.io/client-go/dynamic" | ||
"k8s.io/client-go/tools/cache" | ||
"k8s.io/client-go/util/retry" | ||
) | ||
|
||
const ( | ||
|
@@ -326,7 +331,7 @@ func (ks *scaler) applyScale(ctx context.Context, pa *autoscalingv1alpha1.PodAut | |
} | ||
|
||
// scale attempts to scale the given PA's target reference to the desired scale. | ||
func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, sks *netv1alpha1.ServerlessService, desiredScale int32) (int32, error) { | ||
func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, sks *netv1alpha1.ServerlessService, desiredScale int32, client clientset.Interface, podForErrorChecking *corev1.Pod) (int32, error) { | ||
asConfig := config.FromContext(ctx).Autoscaler | ||
logger := logging.FromContext(ctx) | ||
|
||
|
@@ -371,6 +376,38 @@ func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscal | |
return desiredScale, nil | ||
} | ||
|
||
// Before we apply scale to zero and since we have failed to activate, check if any pod is in waiting state | ||
// This should capture any case where we scale from zero and regular progressDeadline expiration will not be hit due to K8s limitations | ||
// when not being in a deployment rollout. | ||
if desiredScale == 0 && pa.Status.IsActivating() && podForErrorChecking != nil { | ||
if err = checkAvailabilityBeforeScalingDown(ctx, logger, podForErrorChecking, pa, client); err != nil { | ||
return desiredScale, fmt.Errorf("failed to get check availability for target %v: %w", pa.Spec.ScaleTargetRef, err) | ||
} | ||
} | ||
|
||
logger.Infof("Scaling from %d to %d", currentScale, desiredScale) | ||
return desiredScale, ks.applyScale(ctx, pa, desiredScale, ps) | ||
} | ||
|
||
func checkAvailabilityBeforeScalingDown(ctx context.Context, logger *zap.SugaredLogger, pod *corev1.Pod, pa *autoscalingv1alpha1.PodAutoscaler, client clientset.Interface) error { | ||
for _, status := range pod.Status.ContainerStatuses { | ||
if status.Name != revresurces.QueueContainerName { | ||
if w := status.State.Waiting; w != nil { | ||
logger.Debugf("marking revision inactive with: %s: %s", w.Reason, w.Message) | ||
pa.Status.MarkScaleTargetNotInitialized(w.Reason, w.Message) | ||
return retry.RetryOnConflict(retry.DefaultBackoff, func() error { | ||
rev, err := client.ServingV1().Revisions(pa.Namespace).Get(ctx, pa.Name, metav1.GetOptions{}) | ||
if err != nil { | ||
return err | ||
} | ||
rev.Status.MarkResourcesAvailableFalse(w.Reason, w.Message) | ||
if _, err = client.ServingV1().Revisions(pa.Namespace).UpdateStatus(ctx, rev, metav1.UpdateOptions{}); err != nil { | ||
return err | ||
} | ||
return nil | ||
}) | ||
} | ||
} | ||
Comment on lines
+398
to
+410
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is sorta violating our abstractions - if we wanted to propagate this error message to the revision we would have to do it through a PodAutoscaler condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Normally yes but due to the distributed status logic (not clear, undocumented) and how things are implemented this is safer imho, as it makes the decision locally and avoids influencing anything else, down the code path. 🤷 I can try change but it will require to propagate this decision down to the pa status update (not ideal as that code is many lines bellow), I had it this ways previously. Let's see. |
||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should double check usages of this condition. Because before it would always be
Unknown=>(True|False)
and then remain unchanged.I can't recall if there's code that assumes that it never changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we might want to introduce a new condition - to maybe surface subsequent scaling issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are not covering revision transitions? I checked the pa status propagation for the revision reconciliation and we have specific cases where this matters, but don't seem affected. I can take a look again if there is a scenario where this might be a problem. In general we should be able to set this to
False
(for whatever reason), since it is a legitimate value and then any reconciliation should take into consideration that condition and adjust. Here we go fromTrue
toFalse
.