Skip to content

Commit d9e7ad0

Browse files
authored
Updated Quickstart files (#24)
Signed off by: Jess Chitas
1 parent 2653aa4 commit d9e7ad0

File tree

8 files changed

+35
-82
lines changed

8 files changed

+35
-82
lines changed

docs/architecture/Component Architecture/01_deployer.md

Lines changed: 0 additions & 9 deletions
This file was deleted.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Deployer
6+
7+
A key component in llm-d's toolbox is the **llm-d deployer**, the Helm chart for deploying llm-d on Kubernetes.
8+
9+
[llm-d-deployer repository](https://github.com/llm-d/llm-d-deployer)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/guide/Installation/prerequisites.md

Lines changed: 22 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -5,60 +5,6 @@ sidebar_label: Prerequisites
55

66
# Prerequisites for running the llm-d QuickStart
77

8-
### Target Platforms
9-
10-
Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
11-
12-
Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
13-
14-
- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
15-
16-
#### Minikube
17-
18-
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
19-
20-
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
21-
22-
Verify you have properly installed the container toolkit with the runtime of your choice.
23-
24-
```bash
25-
# Podman
26-
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
27-
# Docker
28-
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
29-
```
30-
31-
#### OpenShift
32-
33-
- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
34-
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
35-
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
36-
- Cluster administrator privileges are required to install the llm-d cluster scoped resources
37-
38-
39-
#### Kubernetes
40-
41-
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
42-
43-
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
44-
45-
Verify you have properly installed the container toolkit with the runtime of your choice.
46-
47-
```bash
48-
# Podman
49-
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
50-
# Docker
51-
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
52-
```
53-
54-
#### OpenShift
55-
56-
- OpenShift - This quickstart was tested on OpenShift 4.18. Older versions may work but have not been tested.
57-
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
58-
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
59-
60-
61-
## Software prerequisites -- Client Configuration
628

639
## Client Configuration
6410

@@ -97,31 +43,38 @@ You can use the installer script that installs all the required dependencies. C
9743
### Required credentials and configuration
9844

9945
- [llm-d-deployer GitHub repo – clone here](https://github.com/llm-d/llm-d-deployer.git)
100-
- [ghcr.io Registry – credentials](https://github.com/settings/tokens) You must have a GitHub account and a "classic" personal access token with `read:packages` access to the llm-d-deployer repository.
101-
- [Red Hat Registry – terms & access](https://access.redhat.com/registry/)
10246
- [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) with download access for the model you want to use. By default the sample application will use [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).
10347

10448
> ⚠️ Your Hugging Face account must have access to the model you want to use. You may need to visit Hugging Face [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and
10549
> accept the usage terms if you have not already done so.
10650
107-
Registry Authentication: The installer looks for an auth file in:
51+
### Target Platforms
10852

109-
```bash
110-
~/.config/containers/auth.json
111-
# or
112-
~/.config/containers/config.json
113-
```
53+
Since the llm-d-deployer is based on helm cahrts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
54+
55+
Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
11456

115-
If not found, you can create one with the following commands:
57+
- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
11658

117-
Create with Docker:
59+
60+
#### Minikube
61+
62+
This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
63+
64+
> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
65+
66+
Verify you have properly installed the container toolkit with the runtime of your choice.
11867

11968
```bash
120-
docker --config ~/.config/containers/ login ghcr.io
69+
# Podman
70+
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
71+
# Docker
72+
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
12173
```
12274

123-
Create with Podman:
75+
#### OpenShift
12476

125-
```bash
126-
podman login ghcr.io --authfile ~/.config/containers/auth.json
127-
```
77+
- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
78+
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
79+
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
80+
- Cluster administrator privileges are required to install the llm-d cluster scoped resources

docs/guide/Installation/quickstart.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,6 @@ The installer needs to be run from the `llm-d-deployer/quickstart` directory as
5959
6060
| Flag | Description | Example |
6161
|--------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------|
62-
| `-a`, `--auth-file PATH` | Path to containers auth.json | `./llmd-installer.sh --auth-file ~/.config/containers/auth.json` |
6362
| `-z`, `--storage-size SIZE` | Size of storage volume | `./llmd-installer.sh --storage-size 15Gi` |
6463
| `-c`, `--storage-class CLASS` | Storage class to use (default: efs-sc) | `./llmd-installer.sh --storage-class ocs-storagecluster-cephfs` |
6564
| `-n`, `--namespace NAME` | K8s namespace (default: llm-d) | `./llmd-installer.sh --namespace foo` |
@@ -102,6 +101,7 @@ gatewayClassName, and sits in front of your inference pods to handle path-based
102101
and metrics. This example validates that the gateway itself is routing your completion requests correctly.
103102
You can execute the [`test-request.sh`](https://github.com/llm-d/llm-d-deployer/blob/main/quickstart/test-request.sh) script in the quickstart folder to test on the cluster.
104103
104+
105105
> If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you
106106
> need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the
107107
> script and it should proceed without PodSecurity errors.
@@ -209,8 +209,8 @@ kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-grafan
209209
210210
Access the UIs at:
211211
212-
- Prometheus: [http://YOUR_IP:9090](#)
213-
- Grafana: [http://YOUR_IP:3000](#) (default credentials: admin/admin)
212+
- Prometheus: \<http://YOUR_IP:9090\>
213+
- Grafana: \<http://YOUR_IP:3000\> (default credentials: admin/admin)
214214
215215
##### Option 2: Ingress (Optional)
216216
@@ -320,4 +320,4 @@ make a change, simply uninstall and then run the installer again with any change
320320
321321
```bash
322322
./llmd-installer.sh --uninstall
323-
```
323+
```

0 commit comments

Comments
 (0)