We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.1.3-3.1.2
I'm tracking DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL, this value is marked as a counter. but I'm observing the value going down, i.e.:
timestamp t DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1"...} 57 timestamp t + 1 DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1"...} 21
I'm also seeing this with other metrics that are reported as counters here. Is this expected behavior for counters?
counters
I'ld expect for counters to only go up.
Do not use a counter to expose a value that can decrease.
source: https://prometheus.io/docs/concepts/metric_types/#counter
A100-SXM4-80GB
bare metal
No response
The text was updated successfully, but these errors were encountered:
if this is expected behavior, should we change the type to gauge?
A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
https://prometheus.io/docs/concepts/metric_types/#gauge
Sorry, something went wrong.
No branches or pull requests
What is the version?
3.1.3-3.1.2
What happened?
I'm tracking DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL, this value is marked as a counter. but I'm observing the value going down, i.e.:
I'm also seeing this with other metrics that are reported as
counters
here. Is this expected behavior for counters?What did you expect to happen?
I'ld expect for counters to only go up.
source: https://prometheus.io/docs/concepts/metric_types/#counter
What is the GPU model?
A100-SXM4-80GB
What is the environment?
bare metal
How did you deploy the dcgm-exporter and what is the configuration?
No response
How to reproduce the issue?
No response
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: