Adding support for G4 nvidia-rtx-6000 GPUs for vLLM inference-ref-arch #345

syeda-anjum · 2025-12-17T02:19:59Z

Updated TF for GPU node-pool
Added new custom compute classes for G4 GPU node-pool
Updated GPU G4 vllm deployment and kustomize scripts
Updated READ.ME

docs/platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md

...s/inference-ref-arch/kubernetes-manifests/online-inference-gpu/vllm/g4-qwen3-32b/runtime.env

...forms/gke/base/core/container_node_pool/gpu/region/us-central1/container_node_pool_gpu_g4.tf

...ustom_compute_class/templates/manifests/gpu/g4-180gb/custom-compute-gpu-g4-180gb-s48-x1.yaml

...ustom_compute_class/templates/manifests/gpu/g4-360gb/custom-compute-gpu-g4-360gb-s96-x1.yaml

...stom_compute_class/templates/manifests/gpu/g4-720gb/custom-compute-gpu-g4-720gb-s192-x1.yaml

...forms/gke/base/core/container_node_pool/gpu/region/us-central1/container_node_pool_gpu_g4.tf

docs/platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md

.../templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s192-x4.yaml

...class/templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-x4.yaml

.../templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s384-x8.yaml

...s/templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s96-x2.yaml

...class/templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-x8.yaml

...class/templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-x1.yaml

...class/templates/manifests/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-x2.yaml

…ts/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s384-x8.yaml Co-authored-by: Aaron Rueth <[email protected]>

…ts/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s96-x2.yaml Co-authored-by: Aaron Rueth <[email protected]>

…type

merge main into sanjum-g4-gpus

ferrarimarco

Just minor things to check.

ferrarimarco · 2026-01-07T09:21:22Z

docs/platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md

      export ACCELERATOR_TYPE="h200"
      ```

+    - **NVIDIA RTX 6000 96GB**:


ferrarimarco · 2026-01-07T09:22:11Z

docs/platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md

+    - **NVIDIA RTX 6000 96GB**:
+
+      ```shell
+      export ACCELERATOR_TYPE="rtx-pro-6000"


This string is missing the 96gb suffix that you put in CCC definition. For simplicity, suggest removing the -96gb string from CCC names and their directory names.

ferrarimarco · 2026-01-07T09:24:44Z

...ref-arch/kubernetes-manifests/online-inference-gpu/vllm/rtx-pro-6000-gpt-oss-20b/runtime.env

@@ -0,0 +1,6 @@
+APP_LABEL=vllm-rtx-pro-6000-gpt-oss-20b
+GPU_MEMORY_UTILIZATION=0.95
+MAX_MODEL_LEN=131072


This is exactly the same value as Gemma. Did you check this value for this gpt-oss-20b model?

ferrarimarco · 2026-01-07T09:25:31Z

...bernetes-manifests/online-inference-gpu/vllm/rtx-pro-6000-llama-3-3-70b-instruct/runtime.env

@@ -0,0 +1,7 @@
+APP_LABEL=vllm-rtx-pro-6000-llama-3-3-70b-instruct
+GPU_MEMORY_UTILIZATION=0.95
+MAX_MODEL_LEN=131072


This is exactly the same value as Gemma. Did you check this value for this llama3.3-70b model?

ferrarimarco · 2026-01-07T09:25:55Z

...-manifests/online-inference-gpu/vllm/rtx-pro-6000-llama-4-scout-17b-16e-instruct/runtime.env

@@ -0,0 +1,6 @@
+APP_LABEL=vllm-rtx-pro-6000-llama-4-scout-17b-16e-instruct
+GPU_MEMORY_UTILIZATION=0.95
+MAX_MODEL_LEN=131072


This is exactly the same value as Gemma. Did you check this value for this llama4 model?

syeda-anjum added 2 commits December 16, 2025 13:31

Enable G4 with Qwen on vllm

94c44fa

update tf for g4

1f70dbb

syeda-anjum requested review from alizaidis, arueth and ferrarimarco December 17, 2025 02:19

ferrarimarco requested changes Dec 17, 2025

View reviewed changes

arueth requested changes Dec 18, 2025

View reviewed changes

syeda-anjum and others added 3 commits December 18, 2025 10:04

update RTX Pro 6000 instead of G4 naming

e660c63

add more model support for RTX PRO 6000

fcf771b

Merge branch 'main' into sanjum-g4-gpus

3c42563

syeda-anjum requested review from arueth and ferrarimarco December 18, 2025 17:06

update readme

d311566

arueth reviewed Dec 18, 2025

View reviewed changes

docs/platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md Outdated Show resolved Hide resolved

Changing GPU names to rtx-pro-6000 instead of rtxpro6000

7317f20

arueth requested changes Dec 19, 2025

View reviewed changes

syeda-anjum and others added 6 commits December 19, 2025 11:58

Update platforms/gke/base/core/custom_compute_class/templates/manifes…

0a81854

…ts/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s384-x8.yaml Co-authored-by: Aaron Rueth <[email protected]>

Update platforms/gke/base/core/custom_compute_class/templates/manifes…

dd54a09

…ts/gpu/rtx-pro-6000-96gb/custom-compute-gpu-rtx-pro-6000-96gb-s96-x2.yaml Co-authored-by: Aaron Rueth <[email protected]>

Adding DWS and updating rtx-pro-6000 compute class naming by machine …

5870b7d

…type

adding support for llama models on RTX PRO 6000

be2d85b

use NCCL P2P feature when G4 RTX-PRO-6000 is 2 or more GPUs

7989468

Merge branch 'main' into sanjum-g4-gpus

307bcf8

merge main into sanjum-g4-gpus

syeda-anjum changed the title ~~Adding support for G4 nvidia-rtx-6000 GPUs with Qwen 32b model~~ Adding support for G4 nvidia-rtx-6000 GPUs for vLLM inference-ref-arch Jan 6, 2026

adjust gpu count in compute class

95027b0

syeda-anjum requested a review from arueth January 6, 2026 21:56

arueth approved these changes Jan 6, 2026

View reviewed changes

ferrarimarco requested changes Jan 7, 2026

View reviewed changes

Adding support for G4 nvidia-rtx-6000 GPUs for vLLM inference-ref-arch #345

Are you sure you want to change the base?

Adding support for G4 nvidia-rtx-6000 GPUs for vLLM inference-ref-arch #345

Uh oh!

Conversation

syeda-anjum commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ferrarimarco left a comment

Choose a reason for hiding this comment

Uh oh!

ferrarimarco Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ferrarimarco Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ferrarimarco Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ferrarimarco Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ferrarimarco Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants