-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s-infra-prow-build: Add monitoring #6468
k8s-infra-prow-build: Add monitoring #6468
Conversation
- apiURL: | ||
name: alertmanager-k8c-slack-token | ||
key: url | ||
channel: '#alerting-cncf-prod' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably drop the entire alerting config for the moment. we want metrics visualization first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ameukam I removed alertmanager folder and disabled it's configuration on prometheus.yaml
/assign |
...-infra-prow-build/prow-build/resources/monitoring/grafana/dashboards/kube-state-metrics.yaml
Outdated
Show resolved
Hide resolved
...-infra-prow-build/prow-build/resources/monitoring/grafana/dashboards/node-exporter-full.yaml
Outdated
Show resolved
Hide resolved
.../gcp/terraform/k8s-infra-prow-build/prow-build/resources/monitoring/grafana/datasources.yaml
Outdated
Show resolved
Hide resolved
...a/gcp/terraform/k8s-infra-prow-build/prow-build/resources/monitoring/grafana/deployment.yaml
Outdated
Show resolved
Hide resolved
infra/gcp/terraform/k8s-infra-prow-build/prow-build/resources/monitoring/grafana/lb.yaml
Outdated
Show resolved
Hide resolved
infra/gcp/terraform/k8s-infra-prow-build/prow-build/resources/monitoring/grafana/secret.yaml
Outdated
Show resolved
Hide resolved
infra/gcp/terraform/k8s-infra-prow-build/prow-build/resources/monitoring/grafana/service.yaml
Outdated
Show resolved
Hide resolved
...form/k8s-infra-prow-build/prow-build/resources/monitoring/kube-state-metrics/deployment.yaml
Outdated
Show resolved
Hide resolved
...rraform/k8s-infra-prow-build/prow-build/resources/monitoring/prometheus-main/prometheus.yaml
Outdated
Show resolved
Hide resolved
...rraform/k8s-infra-prow-build/prow-build/resources/monitoring/prometheus-main/prometheus.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this, @koksay!
/lgtm
/approve
/hold
for other folks to take a look
Can we look into using Thanos to aggregate and store metrics for a period of time? Also, I would prefer if prometheus operator was deployed via helm as the diff in Git would be very tiny. |
We plan to do both, however:
That said, I have preference to proceed with this PR as it is and iterating (for both GKE and EKS clusters) in the near future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add Thanos after Kubecon
/hold cancel
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: koksay, upodroid, xmudrii The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pretty sure this change is what is causing scheduling issues for some PR/release blocking jobs: We have to be careful about increasing per-node resource utilization in the build clusters (as opposed to say, a deployment, which we can scale around just fine). I think we should consider reverting and then revisit shortly after, given kubernetes/kubernetes has code freeze pending and we're having trouble testing and merging PRs. kubernetes/test-infra#32157 (comment) For revisit we can discuss in kubernetes/test-infra#32157 after confirming that CI is running smoothly again ... |
@BenTheElder What if we remove the |
Adds monitoring components for the
k8s-infra-prow-build
cluster.Mainly copied from
eks-prow-build-cluster
configurationPlease note that, the
alertmanager-slack-token.yam
file has invalid data for this cluster. It will be updated later when the external secrets are in place.