STOR-2962: Add SELinuxMount GA Upgrade Readiness for 5.0#2010
Conversation
|
@jsafrane: This pull request references STOR-2962 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
b814065 to
3aaeb93
Compare
| What is *the actual API object* is currently open. Ideas: | ||
|
|
||
| * A ConfigMap in a shared namespace, such as | ||
| `openshift-config/selinux-conflicts`. Does KCM have permissions to do so? |
There was a problem hiding this comment.
The carry patch would just need to add additional RBAC for the selinux-warning-controller: https://github.com/kubernetes/kubernetes/blob/b9b0ff440d5493764532348e0d80abdb7daf47b5/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/controller_policy.go#L592
JoelSpeed
left a comment
There was a problem hiding this comment.
No major concerns from my perspective here
Would be interested though if you could add an example of how the end user is supposed to observe the warnings? What will the upgradeable false condition look like and how will they therefore know which pods need attention
Will there be a KCS that explains to them what actions they need to take linked from the condition?
| What is *the actual API object* is currently open. Ideas: | ||
|
|
||
| * A ConfigMap in a shared namespace, such as | ||
| `openshift-config/selinux-conflicts`. Does KCM have permissions to do so? |
There was a problem hiding this comment.
Who is the consumer of this object? Is it for end users or is this considered to be internal communication between openshift components?
There was a problem hiding this comment.
The config map is consumed only by the cluster-storage-operator.
As the nr. of bad Pods can be large (we have a cluster with 6000 of them), users need to use metrics to list the namespaces + pods. The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).
The question is, should the upgradeable condition say generic "there are Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD" or should it be specific about the nr of Pods, "there are 512 Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD"? If we want the actual number, we need to choose how often will KCM update it. Frequent updates will load the cluster unnecessarily, less often updates may give old number to the user.
I'd start with just a boolean flag instead of the actual number.
There was a problem hiding this comment.
The config map is consumed only by the cluster-storage-operator.
Given this is completely internal I think a configmap makes sense as a temporary way to co-ordinate between two components
The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).
In a cluster without metrics, will there be an alternative way for users to identify the pods? Is there some CLI command we could recommend via a KCS?
should it be specific about the nr of Pods
Perhaps you could update the message based on a range? E.g. there are approximately 500
- KCM-O is not a viable approach, it does not run in HyperShift. - Add KCS about details, so we can link it from the alert(s).
|
@jsafrane: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This enhancement prepares OpenShift 5.0 for the
SELinuxMountfeature going GA in Kubernetes 1.37 / OpenShift 5.1.SELinuxMountintroduces a breaking change and we'll need to mark a 5.0 cluster un-upgradeable until the cluster admin fixes their workloads or opts -out from theSELinuxMount. This enhancement proposes how to detect such workloads and how to pass the information from the component that knows it (a<carry>patch in kube-controller-manager) to a component that marks the cluster un-upgradeable (cluster-storage-operator).See metric
cluster:selinux_warning_controller_selinux_volume_conflict:countin telemetry for nr. of affected clusters. It's a very low number (not commenting publicly ;-)). Most clusters will upgrade just fine.There are some open questions about the actual API used to pass the info. Just circulating the idea about a
<carry>patch first before we dive into implementation details.Proof of concept of the
<carry>patch, using a ConfigMap inopenshift-confignamespace as "the API object": openshift/kubernetes#2671 (the actual API object is for discussion).