-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metrics]: Exporter randomly detached from service-account #670
Comments
Same here; I don't know why it's not using the service account, and it prefers to use the Node role. |
Same here, on a fresh deployment using https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-cloudwatch-exporter Restarting the Deployment does not fix it. It uses the |
Hmm, the exporter itself does not do anything special with the configuration you provided – as long as you don't configure AWS authentication in the exporter config, it uses the default authentication chain in the AWS SDK. The only explanation I have is that retrieving or exchanging the service account OIDC token for AWS credentials failed, and the SDK proceeded further in the chain, eventually finding the node credentials. Are there any earlier logs related to this, possibly when running with debug log verbosity? Unfortunately I am very unfamiliar with and eternally confused by the Java logging ecosystem, so I'm not sure how exactly to achieve that. Considering this both broke and recovered without changes to the exporter or configuration, I'm not sure what we can do about it tbh. |
Context information
Exporter configuration
Service Account
IAM Role
Deployment
Exporter logs
What do you expect to happen?
I expected the cloudwatch-exporter to use the attached service-account with the permissions necessary to retrieve metric data.
What happened instead?
What actually happened was the cloudwatch-exporter stopped using the service-account and tried to use the k8s nodes IAM role. Nothing changed, we just stopped recieving the metrics in prometheus and then found the logs.
Restarting the deployment fixed the problem and it started using the service-account again, but the problem is if this would have happened in a production environment, the prometheus alerts we've setup to monitor these metrics wouldnt have met the threshold needed to fire.
Also, without looking at the logs, the pod appeared to be running as normal.
The text was updated successfully, but these errors were encountered: