GoogleCloudPlatform · syeda-anjum · Jan 15, 2026 · Dec 16, 2025 · Dec 17, 2025 · Dec 18, 2025
diff --git a/...ke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md b/...ke/base/use-cases/inference-ref-arch/online-inference-gpu/vllm-with-hf-model.md
@@ -162,15 +162,15 @@ This example is built on top of the
 
   - Select an accelerator.
 
-    | Model                          | l4  | h100 | h200 |
-    | ------------------------------ | --- | ---- | ---- |
-    | gemma-3-1b-it                  | ✅  | ❌   | ❌   |
-    | gemma-3-4b-it                  | ✅  | ❌   | ❌   |
-    | gemma-3-27b-it                 | ✅  | ✅   | ✅   |
-    | gpt-oss-20b                    | ✅  | ✅   | ✅   |
-    | llama-3.3-70b-instruct         | ❌  | ✅   | ✅   |
-    | llama-4-scout-17b-16e-instruct | ❌  | ✅   | ✅   |
-    | qwen3-32b                      | ✅  | ✅   | ✅   |
+    | Model                          | l4  | h100 | h200 | g4  |
+    | ------------------------------ | --- | ---- | ---- | --- |
+    | gemma-3-1b-it                  | ✅  | ❌   | ❌   |     |
+    | gemma-3-4b-it                  | ✅  | ❌   | ❌   |     |
+    | gemma-3-27b-it                 | ✅  | ✅   | ✅   |     |
+    | gpt-oss-20b                    | ✅  | ✅   | ✅   |     |
+    | llama-3.3-70b-instruct         | ❌  | ✅   | ✅   |     |
+    | llama-4-scout-17b-16e-instruct | ❌  | ✅   | ✅   |     |
+    | qwen3-32b                      | ✅  | ✅   | ✅   | ✅  |
 
     - **NVIDIA Tesla L4 24GB**:
 
@@ -190,6 +190,12 @@ This example is built on top of the
       export ACCELERATOR_TYPE="h200"
       ```
 
+    - **NVIDIA RTX 6000 180GB**:
+
+      ```shell
+      export ACCELERATOR_TYPE="g4"
+      ```
+
     Ensure that you have enough quota in your project to provision the selected
     accelerator type. For more information, see about viewing GPU quotas, see
     [Allocation quotas: GPU quota](https://cloud.google.com/compute/resource-usage#gpu_quota).