-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local volume health monitoring #10
Comments
To summarize, I think we can potentially split this up into these main areas:
|
Link to the effort of @NickrenREN so far for a local storage monitor: I agree with the first three but I am unsure about the last one. A common way to report PV health.
A problem here is that in order to take action in the case of an unhealthy PV, you need to manually bind PVs and PVCs. If a PV is reported to be unhealthy and you want to prevent a Pod from using it, there isn't anything you can do other than manually binding the PVC. A mechanism like PV taints, as proposed in the local storage document, would be very helpful here. Please correct me if there's something I'm missing here. Daemonset controller that monitors local disks per nodeAt first, the DaemonSet can watch the mount points and collect smart data, then make the above conditions available through annotations on PV objects. We should be careful to avoid scenarios like node repair on GKE where a failed node will come back with the same name and disks mounted in the same places, but without any data. When that happens, the PVs will still work but now point to empty volumes. Instead, the PVs should not work as the underlying disks are essentially different, they are just mounted at the same points. To prevent this type of scenario, the Daemonset could create symlinks for each discovered directory, include the filesystem UUID in the symlink name and pass the path to the symlink in the PV object. Cluster controller that monitors nodes that get deletedWe should be careful to check all PVs on the startup of the controller. If a Node is deleted before the controller is started then we need to clear PVs belonging to that Node. If a Node is deleted and at the same time this controller crashes, a subsequent list+watch will not return the deleted Node and we won't be notified that something happened. Workload controller that reacts to PV healthAs I mentioned, I don't see any real way of reacting. Deleting the PV doesn't work, because the local provisioner will recreate it. Recreating the PVC+Pod using it will not work because it could still end up binded to that PV. Is there anything we can do here? The local volume doc recommends introducing taints for PVs. |
cc @gnufied |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Thanks for removing the stale label @cofyc , and thanks for your comment @yanniszark We only focus on monitoring mechanism at the first stage, reaction is not in the scope of that doc. Will submit a new PR for storage monitor later. |
PR submitted, comments are welcome, thanks |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lifecycle frozen |
Rebase with upstream
Now that the required logic in kubelet seems to have landed in 1.21; are there any concrete plans adding this to the static provisioner? https://kubernetes.io/blog/2021/04/08/kubernetes-1-21-release-announcement/#persistentvolume-health-monitor |
Migrating from kubernetes-retired/external-storage#817
/kind feature
The text was updated successfully, but these errors were encountered: