Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/dictionary/accelerated-platforms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ prereqs
psutil
qwiklabs
rayutil
rtxpro
rueth
safetensors
scann
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,15 +162,15 @@ This example is built on top of the

- Select an accelerator.

| Model | l4 | h100 | h200 |
| ------------------------------ | --- | ---- | ---- |
| gemma-3-1b-it | ✅ | ❌ | ❌ |
| gemma-3-4b-it | ✅ | ❌ | ❌ |
| gemma-3-27b-it | ✅ | ✅ | ✅ |
| gpt-oss-20b | ✅ | ✅ | ✅ |
| llama-3.3-70b-instruct | ❌ | ✅ | ✅ |
| llama-4-scout-17b-16e-instruct | ❌ | ✅ | ✅ |
| qwen3-32b | ✅ | ✅ | ✅ |
| Model | l4 | h100 | h200 | RTX Pro 6000 |
| ------------------------------ | --- | ---- | ---- | ------------ |
| gemma-3-1b-it | ✅ | ❌ | ❌ | ❌ |
| gemma-3-4b-it | ✅ | ❌ | ❌ | ❌ |
| gemma-3-27b-it | ✅ | ✅ | ✅ | ✅ |
| gpt-oss-20b | ✅ | ✅ | ✅ | ✅ |
| llama-3.3-70b-instruct | ❌ | ✅ | ✅ | - |
| llama-4-scout-17b-16e-instruct | ❌ | ✅ | ✅ | - |
| qwen3-32b | ✅ | ✅ | ✅ | ✅ |

- **NVIDIA Tesla L4 24GB**:

Expand All @@ -190,6 +190,12 @@ This example is built on top of the
export ACCELERATOR_TYPE="h200"
```

- **NVIDIA RTX 6000 96GB**:

```shell
export ACCELERATOR_TYPE="rtx-pro-6000"
```

Ensure that you have enough quota in your project to provision the selected
accelerator type. For more information, see about viewing GPU quotas, see
[Allocation quotas: GPU quota](https://cloud.google.com/compute/resource-usage#gpu_quota).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: gpu-rtx-pro-6000-96gb-s192-x1
spec:
activeMigration:
optimizeRulePriority: true
nodePoolConfig:
imageStreaming:
enabled: true
nodePoolAutoCreation:
enabled: true
priorities:
# Use a specific reservation
# - gpu:
# count: 4
# driverVersion: latest
# type: nvidia-rtx-pro-6000
# machineType: g4-standard-192
# maxPodsPerNode: 32
# reservations:
# affinity: Specific
# specific:
# - name: nvidia-rtx-pro-6000-specific
# reservationBlock:
# name: <RESERVATION_NAME>
# spot: false

# Use any reservation
- gpu:
count: 4
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-192
maxPodsPerNode: 32
reservations:
affinity: AnyBestEffort
spot: false

# Use on-demand
- gpu:
count: 4
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-192
maxPodsPerNode: 32
spot: false
# Use spot
- gpu:
count: 4
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-192
maxPodsPerNode: 32
spot: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: gpu-rtx-pro-6000-96gb-s384-x1
spec:
activeMigration:
optimizeRulePriority: true
nodePoolConfig:
imageStreaming:
enabled: true
nodePoolAutoCreation:
enabled: true
priorities:
# Use a specific reservation
# - gpu:
# count: 8
# driverVersion: latest
# type: nvidia-rtx-pro-6000
# machineType: g4-standard-384
# maxPodsPerNode: 32
# reservations:
# affinity: Specific
# specific:
# - name: nvidia-rtx-pro-6000-specific
# reservationBlock:
# name: <RESERVATION_NAME>
# spot: false

# Use any reservation
- gpu:
count: 8
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-384
maxPodsPerNode: 32
reservations:
affinity: AnyBestEffort
spot: false

# Use on-demand
- gpu:
count: 8
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-384
maxPodsPerNode: 32
spot: false
# Use spot
- gpu:
count: 8
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-384
maxPodsPerNode: 32
spot: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: gpu-rtx-pro-6000-96gb-s48-x1
spec:
activeMigration:
optimizeRulePriority: true
nodePoolConfig:
imageStreaming:
enabled: true
nodePoolAutoCreation:
enabled: true
priorities:
# Use a specific reservation
# - gpu:
# count: 1
# driverVersion: latest
# type: nvidia-rtx-pro-6000
# machineType: g4-standard-48
# maxPodsPerNode: 32
# reservations:
# affinity: Specific
# specific:
# - name: nvidia-rtx-pro-6000-specific
# reservationBlock:
# name: <RESERVATION_NAME>
# spot: false

# Use any reservation
- gpu:
count: 1
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-48
maxPodsPerNode: 32
reservations:
affinity: AnyBestEffort
spot: false

# Use on-demand
- gpu:
count: 1
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-48
maxPodsPerNode: 32
spot: false
# Use spot
- gpu:
count: 1
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-48
maxPodsPerNode: 32
spot: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: gpu-rtx-pro-6000-96gb-s96-x1
spec:
activeMigration:
optimizeRulePriority: true
nodePoolConfig:
imageStreaming:
enabled: true
nodePoolAutoCreation:
enabled: true
priorities:
# Use a specific reservation
# - gpu:
# count: 2
# driverVersion: latest
# type: nvidia-rtx-pro-6000
# machineType: g4-standard-96
# maxPodsPerNode: 32
# reservations:
# affinity: Specific
# specific:
# - name: nvidia-rtx-pro-6000-specific
# reservationBlock:
# name: <RESERVATION_NAME>
# spot: false

# Use any reservation
- gpu:
count: 2
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-96
maxPodsPerNode: 32
reservations:
affinity: AnyBestEffort
spot: false

# Use on-demand
- gpu:
count: 2
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-96
maxPodsPerNode: 32
spot: false
# Use spot
- gpu:
count: 2
driverVersion: latest
type: nvidia-rtx-pro-6000
machineType: g4-standard-96
maxPodsPerNode: 32
spot: true
Loading
Loading