diff --git a/docs/.nav.yml b/docs/.nav.yml index bba135d64..933904847 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -2,27 +2,32 @@ nav: - Home: - vLLM Spyre Plugin: README.md - Getting Started: - - Installation: getting_started/installation.md + - Installation: getting_started/installation.md - Deploying: - - Docker: deploying/docker.md - - Kubernetes: deploying/k8s.md + - Docker: deploying/docker.md + - Kubernetes: deploying/k8s.md + - Red Hat OpenShift AI: deploying/rhoai.md - Examples: - Offline Inference: examples/offline_inference - Other: examples/other - User Guide: - - Configuration: user_guide/configuration.md - - Environment Variables: user_guide/env_vars.md - - Supported Features: user_guide/supported_features.md - - Supported Models: user_guide/supported_models.md + - Configuration: user_guide/configuration.md + - Environment Variables: user_guide/env_vars.md + - Supported Features: user_guide/supported_features.md + - Supported Models: user_guide/supported_models.md - Developer Guide: - Contributing: contributing/README.md - Getting Started: - - Installation: getting_started/installation.md + - Installation: getting_started/installation.md + - Deploying: + - Docker: deploying/docker.md + - Kubernetes: deploying/k8s.md + - Red Hat OpenShift AI: deploying/rhoai.md - User Guide: - - Configuration: user_guide/configuration.md - - Environment Variables: user_guide/env_vars.md - - Supported Features: user_guide/supported_features.md - - Supported Models: user_guide/supported_models.md + - Configuration: user_guide/configuration.md + - Environment Variables: user_guide/env_vars.md + - Supported Features: user_guide/supported_features.md + - Supported Models: user_guide/supported_models.md - Developer Guide: - Contributing: contributing/README.md diff --git a/docs/deploying/k8s.md b/docs/deploying/k8s.md index 6c0e27622..4e3e06451 100644 --- a/docs/deploying/k8s.md +++ b/docs/deploying/k8s.md @@ -61,6 +61,8 @@ The vLLM Documentation on [Deploying with Kubernetes](https://docs.vllm.ai/en/la labels: app: granite-8b-instruct spec: + # Defaults to 600 and must be set higher if your startupProbe needs to wait longer than that + progressDeadlineSeconds: 1200 replicas: 1 selector: matchLabels: @@ -70,6 +72,8 @@ The vLLM Documentation on [Deploying with Kubernetes](https://docs.vllm.ai/en/la labels: app: granite-8b-instruct spec: + # Required for scheduling spyre cards + schedulerName: aiu-scheduler volumes: - name: hf-cache-volume persistentVolumeClaim: @@ -127,15 +131,19 @@ The vLLM Documentation on [Deploying with Kubernetes](https://docs.vllm.ai/en/la httpGet: path: /health port: 8000 - # Long startup delays are necessary for graph compilation - initialDelaySeconds: 1200 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 - initialDelaySeconds: 600 periodSeconds: 5 + startupProbe: + httpGet: + path: /health + port: 8000 + periodSeconds: 10 + # Long startup delays are necessary for graph compilation + failureThreshold: 120 --- apiVersion: v1 kind: Service diff --git a/docs/deploying/rhoai.md b/docs/deploying/rhoai.md new file mode 100644 index 000000000..63565ff88 --- /dev/null +++ b/docs/deploying/rhoai.md @@ -0,0 +1,104 @@ +# Using Red Hat OpenShift AI + +[Red Hat OpenShift AI](https://www.redhat.com/en/products/ai/openshift-ai) is a cloud-native AI platform that bundles together many popular model management projects, including [KServe](https://kserve.github.io/website/latest/). + +This example shows how to use KServe with RHOAI to deploy a model on OpenShift, using a modelcar image to load the model without requiring any connection to Huggingface Hub. + +## Deploying with KServe + +!!! note "Prerequisites" + * A running Kubernetes cluster with RHOAI installed + * Image pull credentials for `registry.redhat.io/rhelai1` + * Spyre accelerators available in the cluster + + + +1. Create a ServingRuntime to serve your models. + + ```yaml + oc apply -f - <