Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect a permanent service failure? #15478

Closed
lsergio opened this issue Aug 21, 2024 · 2 comments
Closed

How to detect a permanent service failure? #15478

lsergio opened this issue Aug 21, 2024 · 2 comments
Labels
kind/question Further information is requested

Comments

@lsergio
Copy link

lsergio commented Aug 21, 2024

Ask your question here:

Hi there.

I'm facing a situation where I need to detect that a Knative Service will never be Ready because its Deployment progress deadline expired. This would happen, for example, when my cluster has no more resources to create new pods.

When I create the Knative Service and check its status, I see the conditions:

    conditions:
    - lastTransitionTime: "2024-08-21T12:33:36Z"
      message: 'Revision "rest-1-00001" failed with message: 0/2 nodes are available:
        2 Too many pods. preemption: 0/2 nodes are available: 2 No preemption victims
        found for incoming pod..'
      reason: RevisionFailed
      status: "False"
      type: ConfigurationsReady
    - lastTransitionTime: "2024-08-21T12:33:36Z"
      message: Configuration "rest-1" does not have any ready Revision.
      reason: RevisionMissing
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-08-21T12:33:36Z"
      message: Configuration "rest-1" does not have any ready Revision.
      reason: RevisionMissing
      status: "False"
      type: RoutesReady

The Ready condition is False with RevisionMissing reason.

After the progress deadline expires. I see the conditions:

    conditions:
    - lastTransitionTime: "2024-08-21T12:29:42Z"
      message: 'Revision "rest-1-00001" failed with message: Initial scale was never
        achieved.'
      reason: RevisionFailed
      status: "False"
      type: ConfigurationsReady
    - lastTransitionTime: "2024-08-21T12:27:11Z"
      message: Configuration "rest-1" does not have any ready Revision.
      reason: RevisionMissing
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-08-21T12:27:11Z"
      message: Configuration "rest-1" does not have any ready Revision.
      reason: RevisionMissing
      status: "False"
      type: RoutesReady

The messages have changed, but the reasons are still the same.

What would be the recommended way of detecting that the Revision failed definitely without relying on parsing error messages?

Thanks for any help!

@lsergio lsergio added the kind/question Further information is requested label Aug 21, 2024
@lsergio lsergio changed the title How to detect a permanent service failure How to detect a permanent service failure? Aug 21, 2024
@skonto
Copy link
Contributor

skonto commented Sep 10, 2024

I think you could have external probes see for example here, otherwise you need to get the revision status and parse it. @dprotaso @dsimansk @ReToCode any other ideas?

@ReToCode
Copy link
Member

Nothing to add @skonto

@lsergio lsergio closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants