You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guide/Installation/prerequisites.md
+22-69Lines changed: 22 additions & 69 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,60 +5,6 @@ sidebar_label: Prerequisites
5
5
6
6
# Prerequisites for running the llm-d QuickStart
7
7
8
-
### Target Platforms
9
-
10
-
Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
11
-
12
-
Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
13
-
14
-
-[OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
15
-
16
-
#### Minikube
17
-
18
-
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
19
-
20
-
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
21
-
22
-
Verify you have properly installed the container toolkit with the runtime of your choice.
23
-
24
-
```bash
25
-
# Podman
26
-
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
27
-
# Docker
28
-
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
29
-
```
30
-
31
-
#### OpenShift
32
-
33
-
- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
34
-
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
35
-
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
36
-
- Cluster administrator privileges are required to install the llm-d cluster scoped resources
37
-
38
-
39
-
#### Kubernetes
40
-
41
-
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
42
-
43
-
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
44
-
45
-
Verify you have properly installed the container toolkit with the runtime of your choice.
46
-
47
-
```bash
48
-
# Podman
49
-
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
50
-
# Docker
51
-
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
52
-
```
53
-
54
-
#### OpenShift
55
-
56
-
- OpenShift - This quickstart was tested on OpenShift 4.18. Older versions may work but have not been tested.
57
-
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
58
-
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
59
-
60
-
61
-
## Software prerequisites -- Client Configuration
62
8
63
9
## Client Configuration
64
10
@@ -97,31 +43,38 @@ You can use the installer script that installs all the required dependencies. C
-[ghcr.io Registry – credentials](https://github.com/settings/tokens) You must have a GitHub account and a "classic" personal access token with `read:packages` access to the llm-d-deployer repository.
101
-
-[Red Hat Registry – terms & access](https://access.redhat.com/registry/)
102
46
-[HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) with download access for the model you want to use. By default the sample application will use [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).
103
47
104
48
> ⚠️ Your Hugging Face account must have access to the model you want to use. You may need to visit Hugging Face [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and
105
49
> accept the usage terms if you have not already done so.
106
50
107
-
Registry Authentication: The installer looks for an auth file in:
51
+
### Target Platforms
108
52
109
-
```bash
110
-
~/.config/containers/auth.json
111
-
# or
112
-
~/.config/containers/config.json
113
-
```
53
+
Since the llm-d-deployer is based on helm cahrts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
54
+
55
+
Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
114
56
115
-
If not found, you can create one with the following commands:
57
+
-[OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
116
58
117
-
Create with Docker:
59
+
60
+
#### Minikube
61
+
62
+
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
63
+
64
+
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
65
+
66
+
Verify you have properly installed the container toolkit with the runtime of your choice.
- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
78
+
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
79
+
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
80
+
- Cluster administrator privileges are required to install the llm-d cluster scoped resources
@@ -102,6 +101,7 @@ gatewayClassName, and sits in front of your inference pods to handle path-based
102
101
and metrics. This example validates that the gateway itself is routing your completion requests correctly.
103
102
You can execute the [`test-request.sh`](https://github.com/llm-d/llm-d-deployer/blob/main/quickstart/test-request.sh) script in the quickstart folder to test on the cluster.
104
103
104
+
105
105
> If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you
106
106
> need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the
107
107
> script and it should proceed without PodSecurity errors.
0 commit comments