Skip to content

Commit dcff633

Browse files
authored
Merge pull request #49 from cld2labs/cld2labs/ubuntu22.04-deployment-scripts
cld2labs/ubuntu22.04-deployment-scripts
2 parents f27ea8a + efcdff4 commit dcff633

File tree

14 files changed

+3178
-1
lines changed

14 files changed

+3178
-1
lines changed

third_party/Dell/README.md

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 355 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,355 @@
1+
# Troubleshooting Guide
2+
3+
This section provides common deployment and runtime issues observed during Intel® AI for Enterprise Inference setup — along with step-by-step resolutions.
4+
5+
**Issues:**
6+
1. [Missing Default User](#1-ansible-deployment-failure--missing-default-user)
7+
2. [Authorization or sudo Password Failure](#2-authorization-or-sudo-password-failure)
8+
3. [Configuration Mismatch (Wrong Parameters)](#3-configuration-mismatch-wrong-parameters)
9+
4. [Kubernetes Cluster Not Reachable](#4-kubernetes-cluster-not-reachable)
10+
5. [Habana Device Plugin CrashLoopBackOff](#5-habana-device-plugin-crashloopbackoff)
11+
6. [Model pods remain in "Pending" state](#6-model-pods-remain-in-pending-state)
12+
7. [Models' Output is Garbled and/or Model Pods Failing](#7-models-output-is-garbled-andor-model-pods-failing)
13+
8. [Model Deployment Failure with Padding-aware scheduling](#8-model-deployment-failure-with-padding-aware-scheduling)
14+
9. [Inference Stack Deploy Keycloak System Error](#9-inference-stack-deploy-keycloak-system-error)
15+
10. [Kubernetes pods failing with "disk pressure"](#10-kubernetes-pods-failing-with-disk-pressure)
16+
11. [Hugging face authentication failure](#11-Hugging-face-authentication-failure)
17+
12. [Docker Image Pull Failure](#12-Docker-Image-Pull-Failure)
18+
13. [Triton Package Compatibility Issue](#13-triton-package-compatibility-issue)
19+
---
20+
21+
### 1. Ansible Deployment Failure — Missing Default User
22+
23+
TASK [download : Prep_download | Create staging directory on remote node]
24+
fatal: [master1]: FAILED! => {"msg": "chown failed: failed to look up user ubuntu"}
25+
26+
27+
**Cause:**
28+
29+
The default Ansible user "ubuntu" does not exist on your system.
30+
31+
**Fix:**
32+
33+
Many cloud images create the "ubuntu" user by default, but your system may not have it. Edit the inventory file to change the Ansible user name to your user:
34+
```bash
35+
vi inventory/hosts.yaml
36+
```
37+
38+
Update the "ansible_user" with the user that owns Enterprise Inference, in the example below, just "user":
39+
40+
```bash
41+
all:
42+
hosts:
43+
master1:
44+
ansible_connection: local
45+
ansible_user: user
46+
ansible_become: true
47+
```
48+
49+
---
50+
51+
### 2. Authorization or sudo Password Failure
52+
53+
Deployment fails with authorization or privilege escalation issues.
54+
55+
**Fix:**
56+
57+
Two options:
58+
1. every time, just prior to executing inference-stack-deploy.sh, execute "sudo echo sudoing" and enter your sudo password. This normally will keep your sudo authorization in effect through the execution of inference-stack-deploy.sh.
59+
2. Add `--ask-become-pass` parameter in the inference-stack-deploy.sh script. Specifically, append this flag after `--become-user=root` in the `ansible-playbook` command of `run_reset_playbook()` and `run_fresh_install_playbook()` (lines 821 and 865). NOTE that this will mean the script will wait for input of your sudo password each time it is run.
60+
61+
---
62+
63+
### 3. Configuration Mismatch (Wrong Parameters)
64+
65+
Deployment fails due to incorrect or missing configuration values.
66+
67+
**Fix:**
68+
Before re-running deployment, verify and update your inference-config.cfg. These values must match your actual deployment environment.
69+
```bash
70+
cluster_url=api.example.com # <-- Replace with cluster url
71+
cert_file=~/certs/cert.pem
72+
key_file=~/certs/key.pem
73+
keycloak_client_id=my-client-id # <-- Replace with your Keycloak client ID
74+
keycloak_admin_user=your-keycloak-admin-user # <-- Replace with your keycloak admin username
75+
keycloak_admin_password=changeme # <-- Replace with your keycloak admin password
76+
vault_pass_code=place-holder-123
77+
deploy_kubernetes_fresh=on
78+
deploy_ingress_controller=on
79+
deploy_keycloak_apisix=on
80+
deploy_genai_gateway=off
81+
deploy_observability=off
82+
deploy_llm_models=on
83+
deploy_ceph=off
84+
deploy_istio=off
85+
```
86+
87+
---
88+
89+
### 4. Kubernetes Cluster Not Reachable
90+
91+
Deployment shows “cluster not reachable” or kubectl command failures.
92+
93+
**Possible Causes & Fixes:**
94+
95+
- **Cause:** Sudo authorization is not cached
96+
97+
- **Fix:** Prior to executing inference-stack-deploy.sh, execute any sudo command, such as `sudo echo sudoing`. That will cache your credentials for the time that inference-stack-deploy.sh is executing.
98+
99+
- **Cause:** Ansible was uninstalled
100+
101+
- **Fix:** Reinstall manually:
102+
103+
```bash
104+
sudo apt update
105+
sudo apt install -y ansible
106+
```
107+
108+
- **Cause:** Kubernetes configuration mismatch
109+
110+
- **Fix:** Ensure `~/.kube/config` exists and the context points to the correct cluster.
111+
112+
- **Cause:** Sudo is stripping the kubectl path from the environment, so kubectl is not found.
113+
114+
- **Fix:** Ensure that the sudoers file includes the path `/usr/local/bin` in the `secure_path` variable. See the user-guide prerequisites for details.
115+
116+
---
117+
118+
### 5. Habana Device Plugin CrashLoopBackOff
119+
120+
habana-ai-device-plugin-ds-* CrashLoopBackOff
121+
ERROR: failed detecting Habana's devices on the system: get device name: no habana devices on the system
122+
123+
**Cause:**
124+
Device plugin unable to detect Gaudi3 PCIe cards.
125+
126+
**Fix:**
127+
Update your Habana device plugin version. Version 1.22.1-6 is recommended.
128+
129+
kubectl set image pod/habana-ai-device-plugin-ds-tjbch \
130+
habana-ai-device-plugin=vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin:1.22.1-6
131+
132+
**Verification:**
133+
134+
```bash
135+
kubectl get pods -A
136+
```
137+
138+
Note: Ensure the habana-ai-device-plugin status changes to Running.
139+
140+
Check driver/NIC versions hl-smi
141+
Confirm runtime version `dpkg -l
142+
Validate Kubernetes health kubectl get nodes -o wide
143+
Check device plugin logs kubectl logs -n habana-ai-operator <device-plugin-pod>
144+
145+
---
146+
147+
### 6. Model Pods Remain in "Pending" State
148+
149+
Problem: After the inference stack is deployed, model pods remain in the "Pending" state and do not progress to the "Running" state, as shown here:
150+
151+
```bash
152+
user@master1:~/Enterprise-Inference/core$ kubectl get pods
153+
NAME READY STATUS RESTARTS AGE
154+
keycloak-0 1/1 Running 0 15m
155+
keycloak-postgresql-0 1/1 Running 0 15m
156+
vllm-deepkseek-r1-qwen-32b-64b885895f-dh566 0/1 Pending 0 10m
157+
vllm-llama-8b-786d7678ff-6fr6l 0/1 Pending 0 10m
158+
```
159+
160+
This can occur if the habana-ai-operator pod does not identify that the gaudi3 devices are allocatable. To check if this is the reason, execute the following command:
161+
162+
```bash
163+
kubectl describe node master1
164+
```
165+
166+
Look for the the "Capacity" and "Allocatable" sections as below, and ensure that both list the correct number of habana.ai/gaudi3 devices for your hardware.
167+
168+
```bash
169+
Capacity:
170+
habana.ai/gaudi: 8
171+
Allocatable:
172+
habana.ai/gaudi: 8
173+
```
174+
175+
If the "Allocatable" section shows zero (0), your pods will remain in the pending state.
176+
To resolve this, execute the following command to restart the operator so it registers the devices:
177+
178+
```bash
179+
kubectl rollout restart ds habana-ai-device-plugin-ds -n habana-ai-operator
180+
```
181+
182+
If the "rollout restart" does not resolve the issue, a system restart often works to fix it.
183+
184+
---
185+
186+
### 7. Models' Output is Garbled and/or Model Pods Failing
187+
188+
IOMMU passthrough is required for Gaudi 3 on **Ubuntu 24.04.2/22.04.5 with Linux kernel 6.8**, and models can produce garbled output or fail if this setting is not applied. Skip this section if a different OS or kernel version is used.
189+
190+
To enable IOMMU passthrough:
191+
1. Add `GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt intel_iommu=on"` to `/etc/default/grub`.
192+
2. Run sudo update-grub.
193+
3. Reboot the system.
194+
195+
---
196+
197+
### 8. Model Deployment Failure with Padding-aware scheduling
198+
199+
**Error:** Padding-aware scheduling currently does not work with chunked prefill
200+
201+
**Casue:** This issue occurs when the --use-padding-aware-scheduling flag is enabled while deploying a vLLM model on Habana Gaudi3.
202+
The current vLLM version (v0.9.0.1+Gaudi-1.22.0) does not support using padding-aware scheduling together with chunked prefill.
203+
204+
**Fix:** If your workload doesn’t require padding-aware scheduling, you can disable it to allow deployment to proceed.
205+
206+
Edit your `gaudi3-values.yaml` file. Locate and remove the following flag from the vLLM startup command:
207+
```bash
208+
--use-padding-aware-scheduling
209+
```
210+
211+
Redeploy the vLLM Helm chart:
212+
```bash
213+
helm upgrade --install vllm-llama-8b ./core/helm-charts/vllm \
214+
--values ./core/helm-charts/vllm/gaudi3-values.yaml
215+
```
216+
217+
Confirm the pod starts successfully:
218+
```bash
219+
kubectl get pods
220+
kubectl logs -f <vllm-pod-name>
221+
```
222+
223+
---
224+
225+
### 9. Inference Stack Deploy Keycloak System Error
226+
227+
**Error:** TASK \[Deploy Keycloak System\] FAILED! ... "Failure when executing Helm command ... response status code 429: toomanyrequests: You have reached your unauthenticated pull rate limit."
228+
229+
**Cause:** This error was seen when attempting a redeployment (running inference_stack_deploy.sh, menu "1) Provision Enterprise Inference Cluster") when the Keycloak service is already installed and the inference_config.cfg "deploy_keycloak_apisix"="on".
230+
231+
**Fix:** Update inference_config.cfg to change "deploy_keycloak_apisix=on" to "deploy_keycloak_apisix=off" and rerun inference_stack_deploy.sh.
232+
233+
---
234+
235+
### 10. Kubernetes pods failing with "disk pressure"
236+
237+
If pods are hanging in "pending" state or in CrashLoopBackoff with "disk pressure" messages when examining logs (kubectl logs <pod> or kubectl describe pod <pod>), you may be lacking space on a required filesystem. The Enterprise Inference standard installation will use /opt/local-path-provisioner for model local storage. Ensure this location has sufficient space allocated. It is recommended that you undeploy any failing models, allocate more space to the local-path-provisioner, then redeploy your models.
238+
239+
---
240+
241+
### 11. Hugging face authentication failure
242+
243+
**Error :** Deployment fails or hangs when running inference-stack-deploy.sh or while deploying models with below error
244+
245+
```bash
246+
su "${USERNAME}" -c "cd /home/${USERNAME}/Enterprise-Inference/core && echo -e '1\n${MODELS}\nyes' | bash ./inference-stack-deploy.sh --models '${MODELS}' --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" resolution to this is getting new hygging face and updating in inference-config
247+
```
248+
**Cause:** The Hugging Face token passed via --hugging-face-token does not match the token stored in inference-config.cfg, or the token has expired / been revoked.
249+
250+
**Fix:**
251+
252+
1. Check if hugging face token has required permission for model trying to deploy.
253+
2. Check if hugging face token is expired. generate new hugging face token, Update your inference-config.cfg and run inference
254+
255+
---
256+
257+
### 12. Docker Image Pull Failure
258+
259+
**Error:** During deployment, the image download task fails and retries multiple times:
260+
```bash
261+
TASK [download : Download_container | Download image if required]
262+
FAILED - RETRYING: [master1]: Download_container | Download image if required
263+
```
264+
265+
**Cause**: Docker Hub enforces pull rate limits for unauthenticated users.
266+
When multiple images are pulled during Enterprise Inference deployment, the limit may be exceeded, causing HTTP 429 Too Many Requests.
267+
268+
This commonly occurs when:
269+
270+
Re-running deployments multiple times
271+
272+
Deploying on fresh nodes without Docker authentication
273+
274+
Multiple images are pulled in quick succession
275+
276+
**Fix:**
277+
278+
Verify the issue with a manual pull test
279+
```bash
280+
sudo ctr -n k8s.io images pull docker.io/library/registry:2.8.1
281+
```
282+
283+
If this fails with 429 Too Many Requests, Docker Hub rate limiting is confirmed.
284+
285+
**Option A — Authenticate to Docker Hub**
286+
287+
-> Log in to Docker Hub so containerd can pull images with higher limits.
288+
```bash
289+
sudo docker login
290+
```
291+
-> Enter your Docker Hub username and password (or access token).
292+
293+
-> After login, retry the image pull:
294+
```bash
295+
sudo ctr -n k8s.io images pull docker.io/kubernetesui/metrics-scraper:v1.0.8
296+
```
297+
298+
**Option B — Wait for Rate Limit Reset**
299+
300+
Docker Hub rate limits typically reset after a few hours. wait 2–4 hours and retry deployment or image pull
301+
302+
### 13. Triton Package Compatibility Issue
303+
304+
**Error:**
305+
During model deployment, the inference service may fail to start and worker processes may exit unexpectedly with an error similar to:
306+
307+
> RuntimeError: Worker failed with error *module `triton` has no attribute `next_power_of_2`*.
308+
309+
**Cause:**
310+
This issue is caused by a compatibility mismatch between the Triton package and the vLLM execution path used during model deployment. It commonly occurs when deploying models using vLLM with default parameter, when Triton is present but does not fully support the required execution path, or when deployments target CPU or accelerator-based platforms (including Gaudi) without platform-specific tuning. As a result,
311+
vLLM workers fail during initialization and the inference engine does not reach a ready state.
312+
313+
**Fix:**
314+
Apply the Intel-recommended environment variables and command-line parameters during model deployment to ensure vLLM uses a compatible execution path.
315+
316+
**Environment Variables (YAML):**
317+
```yaml
318+
VLLM_CPU_KVCACHE_SPACE: "40"
319+
VLLM_RPC_TIMEOUT: "100000"
320+
VLLM_ALLOW_LONG_MAX_MODEL_LEN: "1"
321+
VLLM_ENGINE_ITERATION_TIMEOUT_S: "120"
322+
VLLM_CPU_NUM_OF_RESERVED_CPU: "0"
323+
VLLM_CPU_SGL_KERNEL: "1"
324+
HF_HUB_DISABLE_XET: "1"
325+
```
326+
327+
**Extra Command Arguments (YAML list):**
328+
```yaml
329+
- "--block-size"
330+
- "128"
331+
- "--dtype"
332+
- "bfloat16"
333+
- "--distributed_executor_backend"
334+
- "mp"
335+
- "--enable_chunked_prefill"
336+
- "--enforce-eager"
337+
- "--max-model-len"
338+
- "33024"
339+
- "--max-num-batched-tokens"
340+
- "2048"
341+
- "--max-num-seqs"
342+
- "256"
343+
```
344+
345+
**Notes:**
346+
Tensor parallelism and pipeline parallelism are determined dynamically based on the deployment configuration:
347+
348+
```yaml
349+
tensor_parallel_size: "{{ .Values.tensor_parallel_size }}"
350+
pipeline_parallel_size: "{{ .Values.pipeline_parallel_size }}"
351+
```
352+
353+
**Result:**
354+
After applying the recommended parameters, model deployment completes successfully and the inference service starts without worker initialization failures.
355+

0 commit comments

Comments
 (0)