StatefulSet becomes unavailable on worker node shutdown or crashes. #670

sirhopcount · 2023-05-08T07:23:45Z

Hi,

We're currently testing a Kubernetes cluster with an IBM backend as storage solution and we ran into an issue where nodes cannot be shutdown properly due to an existing issue with the driver. The documentation has a page in regards to Recovering from a crashed Kubernetes node which states the following:

This section details a manual operation required to revive Kubernetes pods that reside on a crashed node due to an existing Kubernetes limitation.

When a worker node shuts down or crashes, all pods in a StatefulSet that reside on it become unavailable. In these scenarios, the node status is NotReady, and the pod status appears as Terminating.

Having to do a restore procedure after a node crash is to be expected but on a node shutdown not so much as Kubernetes runs on highly dynamic environments where nodes come and go this is a pretty big blocker for us.

I tried to find some more information on this but I can't seem to find any in the open or closed issues. Hence the reason for me creating this issue.

Could you please provide me with some more information in regards to this issue:

Technical description of the problem.
Possible mitigations to prevent it from happening.
A timeline if/when this issue is going to be resolved.
Any links or reference to upstream in regards to this issue.

Thanks in advance.

timstoop · 2023-05-22T15:54:00Z

I'd like to point out that the procedure as describe in the docs does not work as advertised. You have to go in and remove the finalizer from the volumeattachments as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StatefulSet becomes unavailable on worker node shutdown or crashes. #670

StatefulSet becomes unavailable on worker node shutdown or crashes. #670

sirhopcount commented May 8, 2023

timstoop commented May 22, 2023

StatefulSet becomes unavailable on worker node shutdown or crashes. #670

StatefulSet becomes unavailable on worker node shutdown or crashes. #670

Comments

sirhopcount commented May 8, 2023

timstoop commented May 22, 2023