Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod diagnostics before scaling down to zero in scaler #15326

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pkg/apis/autoscaling/v1alpha1/pa_lifecycle.go
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,12 @@ func (pas *PodAutoscalerStatus) MarkScaleTargetInitialized() {
podCondSet.Manage(pas).MarkTrue(PodAutoscalerConditionScaleTargetInitialized)
}

// MarkScaleTargetNotInitialized marks the PA's PodAutoscalerConditionScaleTargetInitialized
// condition true.
func (pas *PodAutoscalerStatus) MarkScaleTargetNotInitialized(reason, message string) {
podCondSet.Manage(pas).MarkFalse(PodAutoscalerConditionScaleTargetInitialized, reason, message)
}

Comment on lines +220 to +223
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check usages of this condition. Because before it would always be Unknown=>(True|False) and then remain unchanged.

I can't recall if there's code that assumes that it never changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we might want to introduce a new condition - to maybe surface subsequent scaling issues

Copy link
Contributor Author

@skonto skonto Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall if there's code that assumes that it never changes.

Tests are not covering revision transitions? I checked the pa status propagation for the revision reconciliation and we have specific cases where this matters, but don't seem affected. I can take a look again if there is a scenario where this might be a problem. In general we should be able to set this to False (for whatever reason), since it is a legitimate value and then any reconciliation should take into consideration that condition and adjust. Here we go from True to False.

// MarkSKSReady marks the PA condition denoting that SKS is ready.
func (pas *PodAutoscalerStatus) MarkSKSReady() {
podCondSet.Manage(pas).MarkTrue(PodAutoscalerConditionSKSReady)
Expand Down
9 changes: 7 additions & 2 deletions pkg/reconciler/autoscaling/kpa/kpa.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,16 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pa *autoscalingv1alpha1.
if err := c.ReconcileMetric(ctx, pa, resolveScrapeTarget(ctx, pa)); err != nil {
return fmt.Errorf("error reconciling Metric: %w", err)
}
podCounter := resourceutil.NewPodAccessor(c.podsLister, pa.Namespace, pa.Labels[serving.RevisionLabelKey])

pod, err := podCounter.GetAnyPod()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be getting a pod that isn't ready - eg. you could have min scale = 10 and the last pod can't be scheduled (due to resource constraints)

Copy link
Contributor Author

@skonto skonto Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not targeting all pods, we are targeting the scenario with the image issue. If someone wants to cover all cases he can extend the work here later. Maybe I should change the PR title, here we are adding pod diagnostics for the issue with the image only or similar issues where all pods are stuck and deployment reconciliation cannot catch it due to the known K8s limitations (progress deadline cannot catch all cases).

if err != nil {
return fmt.Errorf("error getting a pod for the revision: %w", err)
}
Comment on lines +119 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching a pod here seems premature

Copy link
Contributor Author

@skonto skonto Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? We already do that in that function via the pod accessor for getting the state a few lines bellow. We are going to test for handling the scale to zero case and check pod status, if we have to.


// Get the appropriate current scale from the metric, and right size
// the scaleTargetRef based on it.
want, err := c.scaler.scale(ctx, pa, sks, decider.Status.DesiredScale)
want, err := c.scaler.scale(ctx, pa, sks, decider.Status.DesiredScale, c.Client, pod)
if err != nil {
return fmt.Errorf("error scaling target: %w", err)
}
Expand Down Expand Up @@ -145,7 +151,6 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pa *autoscalingv1alpha1.
}

// Compare the desired and observed resources to determine our situation.
podCounter := resourceutil.NewPodAccessor(c.podsLister, pa.Namespace, pa.Labels[serving.RevisionLabelKey])
ready, notReady, pending, terminating, err := podCounter.PodCountsByState()
if err != nil {
return fmt.Errorf("error getting pod counts: %w", err)
Expand Down
39 changes: 38 additions & 1 deletion pkg/reconciler/autoscaling/kpa/scaler.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,21 @@ import (
"knative.dev/serving/pkg/activator"
autoscalingv1alpha1 "knative.dev/serving/pkg/apis/autoscaling/v1alpha1"
"knative.dev/serving/pkg/autoscaler/config/autoscalerconfig"
clientset "knative.dev/serving/pkg/client/clientset/versioned"
"knative.dev/serving/pkg/reconciler/autoscaling/config"
kparesources "knative.dev/serving/pkg/reconciler/autoscaling/kpa/resources"
aresources "knative.dev/serving/pkg/reconciler/autoscaling/resources"
revresurces "knative.dev/serving/pkg/reconciler/revision/resources"
"knative.dev/serving/pkg/resources"

"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/dynamic"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/util/retry"
)

const (
Expand Down Expand Up @@ -326,7 +331,7 @@ func (ks *scaler) applyScale(ctx context.Context, pa *autoscalingv1alpha1.PodAut
}

// scale attempts to scale the given PA's target reference to the desired scale.
func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, sks *netv1alpha1.ServerlessService, desiredScale int32) (int32, error) {
func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, sks *netv1alpha1.ServerlessService, desiredScale int32, client clientset.Interface, podForErrorChecking *corev1.Pod) (int32, error) {
asConfig := config.FromContext(ctx).Autoscaler
logger := logging.FromContext(ctx)

Expand Down Expand Up @@ -371,6 +376,38 @@ func (ks *scaler) scale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscal
return desiredScale, nil
}

// Before we apply scale to zero and since we have failed to activate, check if any pod is in waiting state
// This should capture any case where we scale from zero and regular progressDeadline expiration will not be hit due to K8s limitations
// when not being in a deployment rollout.
if desiredScale == 0 && pa.Status.IsActivating() && podForErrorChecking != nil {
if err = checkAvailabilityBeforeScalingDown(ctx, logger, podForErrorChecking, pa, client); err != nil {
return desiredScale, fmt.Errorf("failed to get check availability for target %v: %w", pa.Spec.ScaleTargetRef, err)
}
}

logger.Infof("Scaling from %d to %d", currentScale, desiredScale)
return desiredScale, ks.applyScale(ctx, pa, desiredScale, ps)
}

func checkAvailabilityBeforeScalingDown(ctx context.Context, logger *zap.SugaredLogger, pod *corev1.Pod, pa *autoscalingv1alpha1.PodAutoscaler, client clientset.Interface) error {
for _, status := range pod.Status.ContainerStatuses {
if status.Name != revresurces.QueueContainerName {
if w := status.State.Waiting; w != nil {
logger.Debugf("marking revision inactive with: %s: %s", w.Reason, w.Message)
pa.Status.MarkScaleTargetNotInitialized(w.Reason, w.Message)
return retry.RetryOnConflict(retry.DefaultBackoff, func() error {
rev, err := client.ServingV1().Revisions(pa.Namespace).Get(ctx, pa.Name, metav1.GetOptions{})
if err != nil {
return err
}
rev.Status.MarkResourcesAvailableFalse(w.Reason, w.Message)
if _, err = client.ServingV1().Revisions(pa.Namespace).UpdateStatus(ctx, rev, metav1.UpdateOptions{}); err != nil {
return err
}
return nil
})
}
}
Comment on lines +398 to +410
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sorta violating our abstractions - if we wanted to propagate this error message to the revision we would have to do it through a PodAutoscaler condition.

Copy link
Contributor Author

@skonto skonto Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to do it through a PodAutoscaler condition.

Normally yes but due to the distributed status logic (not clear, undocumented) and how things are implemented this is safer imho, as it makes the decision locally and avoids influencing anything else, down the code path. 🤷 I can try change but it will require to propagate this decision down to the pa status update (not ideal as that code is many lines bellow), I had it this ways previously. Let's see.

}
return nil
}
4 changes: 2 additions & 2 deletions pkg/reconciler/autoscaling/kpa/scaler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ func TestScaler(t *testing.T) {
test.configMutator(cfg)
}
ctx = config.ToContext(ctx, cfg)
desiredScale, err := revisionScaler.scale(ctx, pa, sks, test.scaleTo)
desiredScale, err := revisionScaler.scale(ctx, pa, sks, test.scaleTo, fakeservingclient.Get(ctx), nil)
if err != nil {
t.Error("Scale got an unexpected error:", err)
}
Expand Down Expand Up @@ -647,7 +647,7 @@ func TestDisableScaleToZero(t *testing.T) {
conf := defaultConfig()
conf.Autoscaler.EnableScaleToZero = false
ctx = config.ToContext(ctx, conf)
desiredScale, err := revisionScaler.scale(ctx, pa, nil /*sks doesn't matter in this test*/, test.scaleTo)
desiredScale, err := revisionScaler.scale(ctx, pa, nil /*sks doesn't matter in this test*/, test.scaleTo, fakeservingclient.Get(ctx), nil)

if err != nil {
t.Error("Scale got an unexpected error:", err)
Expand Down
12 changes: 12 additions & 0 deletions pkg/resources/pods.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,18 @@ func NewPodAccessor(lister corev1listers.PodLister, namespace, revisionName stri
}
}

// GetAnyPod returns a pod for the revision that owns the pa, if any
func (pa PodAccessor) GetAnyPod() (pod *corev1.Pod, err error) {
pods, err := pa.podsLister.List(pa.selector)
if err != nil {
return nil, err
}
if len(pods) != 0 {
return pods[0], nil
}
return nil, nil
}

// PodCountsByState returns number of pods for the revision grouped by their state, that is
// of interest to knative (e.g. ignoring failed or terminated pods).
func (pa PodAccessor) PodCountsByState() (ready, notReady, pending, terminating int, err error) {
Expand Down
Loading