Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,21 @@ resources:

Verify that it has been scheduled on one of the __CPU__ nodes.

You can also test by running the example deployment YAML under the [example](./example) folder
You can also test by running the example deployment YAML under the [example](./example) folder.

**Note:** If you want to execute `nvidia-smi` in the example deployment, you need to add the following snippet to the `deployment.yml` file. Replace `<node-name>` with the node name you labeled during installation:

Copy link

@nayihz nayihz Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does <node-name> represent? what do you mean the node name you labeled during installation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvidia-smi still cannot be executed when add

env:
- name: NODE_NAME
  value: default

error msg:

# k exec -it sleepy-deployment-6bddfbb7f4-s8mc4 -- sh                                                                                                              :( 130 25-03-03 - 6:59:40
/ # nvidia-smi
sh: nvidia-smi: not found

Is my understanding incorrect?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a different issue. You might want to check if the device plugin is running correctly.

```yaml
env:
- name: NODE_NAME
value: <node-name>
```

## Troubleshooting

[Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) should be disabled on the gpu-operator namespace

```
```sh
kubectl label ns gpu-operator pod-security.kubernetes.io/enforce=privileged
```

Expand Down