Skip to content

Conversation

@syeda-anjum
Copy link
Collaborator

  • Updated TF for GPU node-pool
  • Added new custom compute classes for G4 GPU node-pool
  • Updated GPU G4 vllm deployment and kustomize scripts
  • Updated READ.ME

@syeda-anjum syeda-anjum changed the title Adding support for G4 nvidia-rtx-6000 GPUs with Qwen 32b model Adding support for G4 nvidia-rtx-6000 GPUs for vLLM inference-ref-arch Jan 6, 2026
@syeda-anjum syeda-anjum requested a review from arueth January 6, 2026 21:56
Copy link
Member

@ferrarimarco ferrarimarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor things to check.

export ACCELERATOR_TYPE="h200"
```
- **NVIDIA RTX 6000 96GB**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTX Pro

- **NVIDIA RTX 6000 96GB**:
```shell
export ACCELERATOR_TYPE="rtx-pro-6000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is missing the 96gb suffix that you put in CCC definition. For simplicity, suggest removing the -96gb string from CCC names and their directory names.

@@ -0,0 +1,6 @@
APP_LABEL=vllm-rtx-pro-6000-gpt-oss-20b
GPU_MEMORY_UTILIZATION=0.95
MAX_MODEL_LEN=131072
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same value as Gemma. Did you check this value for this gpt-oss-20b model?

@@ -0,0 +1,7 @@
APP_LABEL=vllm-rtx-pro-6000-llama-3-3-70b-instruct
GPU_MEMORY_UTILIZATION=0.95
MAX_MODEL_LEN=131072
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same value as Gemma. Did you check this value for this llama3.3-70b model?

@@ -0,0 +1,6 @@
APP_LABEL=vllm-rtx-pro-6000-llama-4-scout-17b-16e-instruct
GPU_MEMORY_UTILIZATION=0.95
MAX_MODEL_LEN=131072
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same value as Gemma. Did you check this value for this llama4 model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants