Skip to content

STOR-2962: Add SELinuxMount GA Upgrade Readiness for 5.0#2010

Open
jsafrane wants to merge 2 commits into
openshift:masterfrom
jsafrane:selinux-block-upgrade
Open

STOR-2962: Add SELinuxMount GA Upgrade Readiness for 5.0#2010
jsafrane wants to merge 2 commits into
openshift:masterfrom
jsafrane:selinux-block-upgrade

Conversation

@jsafrane
Copy link
Copy Markdown
Contributor

@jsafrane jsafrane commented May 14, 2026

This enhancement prepares OpenShift 5.0 for the SELinuxMount feature going GA in Kubernetes 1.37 / OpenShift 5.1.

SELinuxMount introduces a breaking change and we'll need to mark a 5.0 cluster un-upgradeable until the cluster admin fixes their workloads or opts -out from the SELinuxMount. This enhancement proposes how to detect such workloads and how to pass the information from the component that knows it (a <carry> patch in kube-controller-manager) to a component that marks the cluster un-upgradeable (cluster-storage-operator).

See metric cluster:selinux_warning_controller_selinux_volume_conflict:count in telemetry for nr. of affected clusters. It's a very low number (not commenting publicly ;-)). Most clusters will upgrade just fine.

There are some open questions about the actual API used to pass the info. Just circulating the idea about a <carry> patch first before we dive into implementation details.

Proof of concept of the <carry> patch, using a ConfigMap in openshift-config namespace as "the API object": openshift/kubernetes#2671 (the actual API object is for discussion).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 14, 2026

@jsafrane: This pull request references STOR-2962 which is a valid jira issue.

Details

In response to this:

This enhancement prepares OpenShift 5.0 for the SELinuxMount feature going GA in Kubernetes 1.37 / OpenShift 5.1.

SELinuxMount introduces a breaking change and we'll need to mark a 5.0 cluster un-upgradeable until the cluster admin fixes their workloads or opts -out from the SELinuxMount. This enhancement proposes how to detect such workloads and how to pass the information from the component that knows it (a <carry> patch in kube-controller-manager) to a component that marks the cluster un-upgradeable (cluster-storage-operator).

See metric cluster:selinux_warning_controller_selinux_volume_conflict:count in telemetry for nr. of affected clusters. It's a very low number (not commenting publicly ;-)). Most clusters will upgrade just fine.

There are some open question about the actual API used to pass the info. Just circulating the idea about a <carry> patch first before we dive into implementation details.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from 2uasimojo and enxebre May 14, 2026 12:50
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign frobware for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

What is *the actual API object* is currently open. Ideas:

* A ConfigMap in a shared namespace, such as
`openshift-config/selinux-conflicts`. Does KCM have permissions to do so?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concerns from my perspective here

Would be interested though if you could add an example of how the end user is supposed to observe the warnings? What will the upgradeable false condition look like and how will they therefore know which pods need attention

Will there be a KCS that explains to them what actions they need to take linked from the condition?

What is *the actual API object* is currently open. Ideas:

* A ConfigMap in a shared namespace, such as
`openshift-config/selinux-conflicts`. Does KCM have permissions to do so?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the consumer of this object? Is it for end users or is this considered to be internal communication between openshift components?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config map is consumed only by the cluster-storage-operator.

As the nr. of bad Pods can be large (we have a cluster with 6000 of them), users need to use metrics to list the namespaces + pods. The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).

The question is, should the upgradeable condition say generic "there are Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD" or should it be specific about the nr of Pods, "there are 512 Pods that could get broken during upgrade to 5.0 / 4.23, please see metric TBD"? If we want the actual number, we need to choose how often will KCM update it. Frequent updates will load the cluster unnecessarily, less often updates may give old number to the user.

I'd start with just a boolean flag instead of the actual number.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config map is consumed only by the cluster-storage-operator.

Given this is completely internal I think a configmap makes sense as a temporary way to co-ordinate between two components

The upgradeable condition will try to point the users to the metric. There will be an alert with a longer human friendly description and name of the metric to check (and maybe a link to the console with the metric, if I find how to make it).

In a cluster without metrics, will there be an alternative way for users to identify the pods? Is there some CLI command we could recommend via a KCS?

should it be specific about the nr of Pods

Perhaps you could update the message based on a range? E.g. there are approximately 500

- KCM-O is not a viable approach, it does not run in HyperShift.
- Add KCS about details, so we can link it from the alert(s).
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2026

@jsafrane: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants