feat: support custom runtimeClass and topology namespace #127

mu8086 · 2025-07-23T09:25:43Z

Allow configuring a custom runtimeClass.name to avoid conflict with NVIDIA's default runtime
Topology server namespace in nvidia-smi is now configurable via the TOPOLOGY_CM_NAMESPACE environment variable instead of being hardcoded to gpu-operator

Example Helm upgrade command

helm upgrade --install fake-gpu-operator ~/git/fake-gpu-operator/deploy/fake-gpu-operator \
  --namespace runai --create-namespace \
  --set runtimeClass.name=fake-nvidia

Example: Verified Pod Spec

This Pod verifies that the custom runtimeClass and dynamic topology namespace injection works correctly.

Click to expand pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: "1"
spec:
  runtimeClassName: fake-nvidia
  containers:
  - name: ubuntu
    image: ubuntu:22.04
    command: ["/bin/bash", "-c"]
    args:
      - |
        sleep infinity;
    resources:
      limits:
        nvidia.com/gpu: 1
    env:
      - name: NODE_NAME
        valueFrom:
          fieldRef:
            fieldPath: spec.nodeName

…espace - Allow configuring a custom `runtimeClass.name` to avoid conflict with NVIDIA's default runtime - Topology server namespace in `nvidia-smi` is now configurable via the `TOPOLOGY_CM_NAMESPACE` environment variable instead of being hardcoded to `gpu-operator` ### Example Helm upgrade command ```bash helm upgrade --install fake-gpu-operator ~/git/fake-gpu-operator/deploy/fake-gpu-operator \ --namespace runai --create-namespace \ --set runtimeClass.name=fake-gpu

gshaibi

Thank you very much @mu8086 for your contribution!

From what I understand from your comment, you wish to run both the Fake GPU Operator and the original one together on the same cluster.
Unfortunately this is not supported yet.
I'd love to hear more about this use case.

Regardless, configuring the RuntimeClass name and respecting the release namespace in nvidia-smi seems reasonable - I left a couple of comments.

gshaibi · 2025-09-07T11:55:59Z

cmd/nvidia-smi/main.go


 	// Send http request to topology-server to get the topology
-	topologyUrl := "http://topology-server.gpu-operator/topology/nodes/" + nodeName
+	topologyUrl := fmt.Sprintf("http://topology-server.%s/topology/nodes/%s",


Please inject and use a FAKE_GPU_OPERATOR_NAMESPACE instead

gshaibi · 2025-09-07T11:56:19Z

deploy/fake-gpu-operator/templates/runtime-class.yml

 kind: RuntimeClass
 metadata:
-  name: nvidia
+  name: {{ .Values.runtimeClass.name | default "fake-nvidia" }}


I suggest we keep the default nvidia to better fake the Nvidia GPU Operator behavior.

gshaibi · 2025-09-07T11:56:31Z

Makefile

 COMPONENTS?=device-plugin status-updater kwok-gpu-device-plugin status-exporter topology-server mig-faker jupyter-notebook

-DOCKER_REPO_BASE=gcr.io/run-ai-lab/fake-gpu-operator
+DOCKER_REPO_BASE?=gcr.io/run-ai-lab/fake-gpu-operator


mu8086 changed the title ~~feat(fake-gpu-operator): support custom runtimeClass and topology nam…~~ feat: support custom runtimeClass and topology nam… Jul 23, 2025

mu8086 changed the title ~~feat: support custom runtimeClass and topology nam…~~ feat: support custom runtimeClass and topology namespace Jul 23, 2025

gshaibi reviewed Sep 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support custom runtimeClass and topology namespace #127

feat: support custom runtimeClass and topology namespace #127

Uh oh!

mu8086 commented Jul 23, 2025

Uh oh!

gshaibi left a comment

Uh oh!

gshaibi Sep 7, 2025

Uh oh!

gshaibi Sep 7, 2025

Uh oh!

gshaibi Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: support custom runtimeClass and topology namespace #127

Are you sure you want to change the base?

feat: support custom runtimeClass and topology namespace #127

Uh oh!

Conversation

mu8086 commented Jul 23, 2025

Example Helm upgrade command

Example: Verified Pod Spec

Uh oh!

gshaibi left a comment

Choose a reason for hiding this comment

Uh oh!

gshaibi Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

gshaibi Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

gshaibi Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants