Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples should come with health and readiness checks #685

Open
Tracked by #698
Jeffwan opened this issue Feb 16, 2025 · 4 comments
Open
Tracked by #698

Examples should come with health and readiness checks #685

Jeffwan opened this issue Feb 16, 2025 · 4 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/documentation Improvements or additions to documentation kind/support Categorizes issue as a support question. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 16, 2025

🚀 Feature Description and Motivation

Currently, the pod becomes ready immediately, however, the application loading time is still long, at this moment, request to the model server will fail. We used to have such settings but we recently remove them for simplicity.

Use Case

for stable deployment

Proposed Solution

No response

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Feb 18, 2025

please focus on samples folder

@Jeffwan Jeffwan added kind/documentation Improvements or additions to documentation kind/support Categorizes issue as a support question. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. good first issue Good for newcomers help wanted Extra attention is needed labels Feb 18, 2025
@Jeffwan Jeffwan mentioned this issue Feb 24, 2025
41 tasks
@Jeffwan Jeffwan added this to the v0.3.0 milestone Feb 26, 2025
@vivek-orbi
Copy link

I'm willing to take this up.

Based on the samples, here's my understanding of the solution for your requirement:

  1. Problem: Pod becomes ready immediately while the application/model is still loading, causing failed requests.
  2. Proposed Solution: Implement health and readiness probes with appropriate delays:
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 120
  periodSeconds: 5
  timeoutSeconds: 1
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 120
  periodSeconds: 5
  timeoutSeconds: 1
  failureThreshold: 5
  1. Key Settings:
  • 120 seconds initial delay to account for model loading time
  • Same /health endpoint for both probes
  • Different failure thresholds (3 for liveness, 5 for readiness)

Is my understanding correct that:

  1. Your main issue is premature traffic routing before the model is fully loaded?
  2. The 120-second initial delay would be sufficient for your model loading time?
  3. You're using a setup similar to the samples (vLLM or similar serving framework)?

Please let me know if any of these assumptions need adjustment for your specific use case.

@jolfr
Copy link
Contributor

jolfr commented Feb 28, 2025

The Quickstart Model Sample already includes checks, but they are too tight for the current model download. 120 seconds is not enough. Going to log an issue and will link it here.

@jolfr
Copy link
Contributor

jolfr commented Feb 28, 2025

See #772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/documentation Improvements or additions to documentation kind/support Categorizes issue as a support question. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

3 participants