diff --git a/demo/specs/quickstart/README.md b/demo/specs/quickstart/README.md index b7feeca56..4a16ebaef 100644 --- a/demo/specs/quickstart/README.md +++ b/demo/specs/quickstart/README.md @@ -1,3 +1,5 @@ +You can run basic examples on a Linux desktop by following the instructions in the [desktop folder](desktop/README.md) as well. + #### Show current state of the cluster ```console kubectl get pod -A diff --git a/demo/specs/quickstart/desktop/README.md b/demo/specs/quickstart/desktop/README.md new file mode 100644 index 000000000..50a63b57a --- /dev/null +++ b/demo/specs/quickstart/desktop/README.md @@ -0,0 +1,294 @@ +# Basic examples for a Linux desktop or workstation +* [Prerequsites](#prerequsites) + * [Run examples](#run-examples) + * [1. SPSC-GPU: a single pod with a single container accesses a GPU via ResourceClaimTemplate](#example-1-spsc-gpu-a-single-pod-with-a-single-container-accesses-a-gpu-via-resourceclaimtemplate) + * [2. SPMC-Shared-GPU: a single pod's multiple containers share a GPU via ResourceClaimTemplate](#example-2-spmc-shared-gpu-a-single-pods-multiple-containers-share-a-gpu-via-resourceclaimtemplate) + * [3. MPSC-Shared-GPU: multiple pods, each with a single container, share a GPU via ResourceClaim](#example-3-mpsc-shared-gpu-multiple-pods-each-with-a-single-container-share-a-gpu-via-resourceclaim) + * [4. MPSC-Unshared-GPU: multiple pods, each with a single container, request dedicated GPU access](#example-4-mpsc-unshared-gpu-multiple-pods-each-with-a-single-container-request-dedicated-gpu-access) + * [5. SPMC-MPS-GPU: a single pod's multiple containers share a GPU via MPS](#example-5-spmc-mps-gpu-a-single-pods-multiple-containers-share-a-gpu-via-mps) + * [6. MPSC-MPS-GPU: multiple pods, each with a single container, share a GPU via MPS](#example-6-mpsc-mps-gpu-multiple-pods-each-with-a-single-containershare-a-gpu-via-mps) + * [7. SPMC-TimeSlicing-GPU: a single pod's multiple containers share a GPU via TimeSlicing](#example-7-spmc-timeslicing-gpu-a-single-pods-multiple-containers-share-a-gpu-via-timeslicing) + +## Prerequsites + +You will need a Linux machine with a NVIDIA GPU such as GeForce, install the DRA driver and create a kind cluster by following the instructions in the [DRA driver setup](https://github.com/yuanchen8911/k8s-dra-driver?tab=readme-ov-file#demo). + +#### Show the current GPU configuration of the machine +```console +nvidia-smi -L +``` + +``` +GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-84f293a6-d610-e3dc-c4d8-c5d94409764b) +``` + +#### Show the cluster up +```console +kubectl cluster-info +kubectl get nodes +``` + +``` +Kubernetes control plane is running at https://127.0.0.1:34883 +CoreDNS is running at https://127.0.0.1:34883/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. + +NAME STATUS ROLES AGE VERSION +k8s-dra-driver-cluster-control-plane Ready control-plane 4d1h v1.29.1 +k8s-dra-driver-cluster-worker Ready 4d1h v1.29.1 +``` + +#### Show the DRA-driver running +```console +kubectl get pod -n nvidia-dra-driver +``` + +``` +NAME READY STATUS RESTARTS AGE +nvidia-k8s-dra-driver-controller-6d5869d478-rr488 1/1 Running 0 4d1h +nvidia-k8s-dra-driver-kubelet-plugin-qqq5b 1/1 Running 0 4d1h +``` + + +## Run examples + +#### Example 1 (SPSC-GPU): a single pod with a single container accesses a GPU via ResourceClaimTemplate + +```console +kubectl apply -f single-pod-single-container-gpu.yaml +sleep 2 +kubectl get pods -n spsc-gpu-test +``` + +The pod will be running. +``` +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 6s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` + +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +```console +kubectl delete -f single-pod-single-container-gpu.yaml +``` + +#### Example 2 (SPMC-Shared-GPU): a single pod's multiple containers share a GPU via ResourceClaimTemplate + +```console +kubectl apply -f single-pod-multiple-containers-shared-gpu.yaml +sleep 2 +kubectl get pods -n spmc-shared-gpu-test +``` + +The pod will be running. +``` +# NAME READY STATUS RESTARTS AGE +# gpu-pod 2/2 Running 2 (55s ago) 2m13s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +```console +kubectl delete -f single-pod-single-container-gpu.yaml +``` + +#### Example 3 (MPSC-Shared-GPU): multiple pods, each with a single container, share a GPU via ResourceClaim + +```console +kubectl apply -f multiple-pods-single-container-shared-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-shared-gpu-test +``` + +Two pods will be running. +``` +# $ kubectl get pods -n mpsc-shared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 11s +# gpu-pod-2 1/1 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB | +# |=======================================================================================| +``` + +Delete the pods: +```console +kubectl delete -f multiple-pods-single-container-shared-gpu.yaml +``` + +#### Example 4 (MPSC-Unshared-GPU): multiple pods, each with a single container, request dedicated GPU access + +```console +kubectl apply -f multiple-pods-single-container-unshared-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-unshared-gpu-test +``` + +One pod will be running and the other one is pending. +``` +# $ kubectl get pods -n mpsc-unshared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 11s +# gpu-pod-2 1/1 Pending 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB | +# |=======================================================================================| +``` + +Delete the pods: +``` +kubectl delete -f multiple-pods-single-container-unshared-gpu.yaml +``` + +#### Example 5 (SPMC-MPS-GPU): a single pod's multiple containers share a GPU via MPS + +```console +kubectl apply -f single-pod-multiple-containers-mps-gpu.yaml +sleep 2 +kubectl get pods -n spmc-mps-gpu-test +``` + +The pod will be running. +``` +# $ kubectl get pods -n mpsc-unshared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 2/2 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB | +# | 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB | +# | 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB | +# +---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +``` +kubectl delete -f single-pod-multiple-containers-mps-gpu.yaml +``` + +#### Example 6 (MPSC-MPS-GPU): multiple pods, each with a single container,share a GPU via MPS + +```console +kubectl apply -f multiple-pods-single-container-mps-gpu.yaml +sleep 2 +kubectl get pods -n mpsc-mps-gpu-test +``` + +Two pods will be running and the other one is pending. +``` +# $ kubectl get pods -n mpsc-unshared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 11s +# gpu-pod-2 1/1 Running 0 11s +``` + +Running `nvidia-smi` will show something like the following: +```console +nvidia-smi +``` +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB | +# +---------------------------------------------------------------------------------------+ +``` + +Delete the pods: +```console +kubectl delete -f multiple-pods-single-container-mps-gpu.yaml +``` + +#### Example 7 (SPMC-Timeslicing-GPU): a single pod's multiple containers share a GPU via TimeSlicing + +```console +kubectl apply -f single-pod-multiple-containers-timeslicing-gpu.yaml +sleep 2 +kubectl get pods -n spmc-timeslicing-gpu-test +``` + +The pod will be running. +``` +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 10s +``` + +Running `nvidia-smi` will show something like the following: +``` +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1575573 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ +``` + +Delete the pod: +``` +kubectl delete -f single-pod-multiple-containers-timeslicing-gpu.yaml +``` diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml new file mode 100644 index 000000000..8609394b2 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-mps-gpu.yaml @@ -0,0 +1,94 @@ +# MPSC-MPS-GPU: multiple pods, each with a single container, share a GPU via MPS. + +# Two pods will be running. +# $ kubectl get pods -n spmc-mps-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 14s +# gpu-pod-2 1/1 Running 0 14s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB | +# | 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB | +# +---------------------------------------------------------------------------------------+ +# +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-mps-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-mps-sharing +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-mps-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-mps-sharing +spec: + sharing: + strategy: MPS + mpsConfig: + defaultActiveThreadPercentage: 50 + defaultPinnedDeviceMemoryLimit: 10Gi + # defaultPerDevicePinnedMemoryLimit: + # 0: 5Gi + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-mps-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml new file mode 100644 index 000000000..aab688c90 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-shared-gpu.yaml @@ -0,0 +1,73 @@ +# MPSC-Shared-GPU: multiple pods, each with a single container, share a GPU via ResourceClaimTemplate. +# +# Two pods will be running. +# $ kubectl get pods -n mpsc-shared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 0 11s +# gpu-pod-2 1/1 Running 0 11s + +# Running the command `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB | +# |=======================================================================================| + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-shared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: mpsc-shared-gpu-test + name: shared-gpu +spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-shared-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: shared-gpu + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-shared-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimName: shared-gpu diff --git a/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml b/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml new file mode 100644 index 000000000..c6adab1c1 --- /dev/null +++ b/demo/specs/quickstart/desktop/multiple-pods-single-container-unshared-gpu.yaml @@ -0,0 +1,73 @@ +# MPSC-Unshared-GPU: multiple pods, each with a single container, request dedicated access to a GPU. +# +# One pod will be running and the other one will be pending. +# $ kubectl get pods -n mpsc-unshared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod-1 1/1 Running 1 (21s ago) 58s +# gpu-pod-2 0/1 Pending 0 25s +# +# Running the command `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# | 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB | +# |=======================================================================================| + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: mpsc-unshared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu-pod-1 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimTemplateName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: mpsc-unshared-gpu-test + name: gpu-pod-2 + labels: + app: pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimTemplateName: gpu.nvidia.com diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml new file mode 100644 index 000000000..f32235276 --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-mps-gpu.yaml @@ -0,0 +1,78 @@ +# SPMC-MPS-GPU: a single pod's multiple containers share a GPU via MPS. + +# The pod will be running. +# $ kubectl get pods -n spmc-mps-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 2/2 Running 0 8s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB | +# | 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB | +# | 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-mps-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: spmc-mps-gpu-test + name: gpu-mps-sharing +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-mps-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: spmc-mps-gpu-test + name: gpu-mps-sharing +spec: + sharing: + strategy: MPS + mpsConfig: + defaultActiveThreadPercentage: 50 + defaultPinnedDeviceMemoryLimit: 10Gi + # defaultPerDevicePinnedMemoryLimit: + # 0: 5Gi + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-mps-gpu-test + name: gpu-pod + labels: + app: pod +spec: + containers: + - name: ctr0 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu + resourceClaims: + - name: gpu + source: + resourceClaimName: gpu-mps-sharing diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml new file mode 100644 index 000000000..de9f5266e --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-shared-gpu.yaml @@ -0,0 +1,57 @@ +# SPMC-Shared-GPU: a single pod's multiple containers share access to a GPU via ResourceClaimTemplate. +# +# The pod will be running. +# $ kubectl get pods -n spmc-shared-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 2/2 Running 2 (55s ago) 2m13s + +# Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB | +# | 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-shared-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: spmc-shared-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-shared-gpu-test + name: gpu-pod +spec: + containers: + - name: ctr0 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: shared-gpu + resourceClaims: + - name: shared-gpu + source: + resourceClaimTemplateName: gpu.nvidia.com diff --git a/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml new file mode 100644 index 000000000..494a2e6c3 --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-multiple-containers-timeslicing-gpu.yaml @@ -0,0 +1,89 @@ +# SPMC-Timeslicing-GPU: a single pod's multiple containers share a GPU via TimeSlicing. + +# The pod will be running. +# $ kubectl get pods -n spmc-timeslicing-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 10s +# +# # Run `nvidia-smi` will show something like the following: +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1575573 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spmc-timeslicing-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-ts-sharing-0 +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-ts-sharing + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaim +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-ts-sharing-1 +spec: + resourceClassName: gpu.nvidia.com + parametersRef: + apiGroup: gpu.resource.nvidia.com + kind: GpuClaimParameters + name: gpu-ts-sharing + +--- +apiVersion: gpu.resource.nvidia.com/v1alpha1 +kind: GpuClaimParameters +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-ts-sharing +spec: + sharing: + strategy: TimeSlicing + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spmc-timeslicing-gpu-test + name: gpu-pod + labels: + app: pod +spec: + containers: + - name: ctr0 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu0 + containers: + - name: ctr1 + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: gpu1 + resourceClaims: + - name: gpu0 + source: + resourceClaimName: gpu-ts-sharing-0 + resourceClaims: + - name: gpu1 + source: + resourceClaimName: gpu-ts-sharing-1 diff --git a/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml b/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml new file mode 100644 index 000000000..c57ad5390 --- /dev/null +++ b/demo/specs/quickstart/desktop/single-pod-single-container-gpu.yaml @@ -0,0 +1,49 @@ +# SPSC-GPU: a single pod wth a single container accesses a GPU via ResourceClaimTemplate. + +# $ kubectl get pods -n spsc-gpu-test +# NAME READY STATUS RESTARTS AGE +# gpu-pod 1/1 Running 0 6s + +# Run `nvidia-smi` shoud show something like the following +# +---------------------------------------------------------------------------------------+ +# | Processes: | +# | GPU GI CI PID Type Process name GPU Memory | +# | ID ID Usage | +# |=======================================================================================| +# | 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB | +# +---------------------------------------------------------------------------------------+ + +--- +apiVersion: v1 +kind: Namespace +metadata: + name: spsc-gpu-test + +--- +apiVersion: resource.k8s.io/v1alpha2 +kind: ResourceClaimTemplate +metadata: + namespace: spsc-gpu-test + name: gpu.nvidia.com +spec: + spec: + resourceClassName: gpu.nvidia.com + +--- +apiVersion: v1 +kind: Pod +metadata: + namespace: spsc-gpu-test + name: gpu-pod +spec: + containers: + - name: ctr + image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu18.04 + args: ["--benchmark", "--numbodies=2560000"] + resources: + claims: + - name: single-gpu + resourceClaims: + - name: single-gpu + source: + resourceClaimTemplateName: gpu.nvidia.com