-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DCGM is not getting loaded #422
Labels
bug
Something isn't working
Comments
@Pryz, For the troubleshooting, do the following:
This will help us to see what is the GPU model and why the DCGM doesn't load profiling module. |
Sure thing. Thanks @nvvfedorov. Here are the infos:
|
@nvvfedorov any other info I can provide to help? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What is the version?
3.3.9-3.6.1-ubuntu22.04
What happened?
Hi there,
We are deploying the exporter version 3.3.9-3.6.1-ubuntu22.04 on in Docker on ECS. The task is configured with CAP_SYS_ADMIN, PID host and has access to all GPUs.
The logs indicate that the DCGM module is not loaded even tho, if I understand correctly, the exporter is supposed to use it via an embedded mode. Here are the logs:
This is the last Docker configuration I tried:
Any recommendation on where to go from there?
What did you expect to happen?
I am expecting to collect all DCGM metrics.
What is the GPU model?
NVIDIA L4
What is the environment?
Running in AWS ECS.
How did you deploy the dcgm-exporter and what is the configuration?
Running in Docker via an ECS daemon.
How to reproduce the issue?
No response
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: