enhancement: dual-stream RHEL 9/10 NodePool support in HyperShift#2019
enhancement: dual-stream RHEL 9/10 NodePool support in HyperShift#2019enxebre wants to merge 2 commits into
Conversation
jparrill
left a comment
There was a problem hiding this comment.
Thanks for putting this together, Alberto, the enhancement looks really nice. I've left some inline comments worth looking at. The main areas I'd like to see addressed:
- Boot image availability: What happens when a user requests a stream whose boot image doesn't exist in their region/platform? (Comment on Risks section)
- InPlace NodePools: The enhancement doesn't address
osImageStreamchanges on InPlace upgrade-type NodePools, which has significant implications for baremetal and disconnected deployments. (Comment on Upgrade section) - Karpenter: The Karpenter path needs a brief subsection explaining how the default stream flows through the pipeline. (Comment on Non-Goals)
- Hash invariant: The separation between spec-value and resolved-value in the config hash is critical for avoiding unintended rollouts and deserves explicit documentation. (Comment on Phase 2)
Everything else is either minor (condition type, pipeline diagram) or informational (CI cost, graduation timeline, status data source).
|
|
||
| 3. **runc detection false negatives.** A user could configure runc via a mechanism other than CRI-O drop-in files. *Mitigation:* Match the MCO's detection logic exactly. If the MCO considers it sufficient for standalone, it is sufficient for HyperShift. | ||
|
|
||
| 4. **Mixed-stream clusters.** A HostedCluster with RHEL 9 and RHEL 10 NodePools is a new topology. Component compatibility across RHEL versions must be validated. *Mitigation:* This is the same topology supported by standalone clusters via per-MachineConfigPool stream selection. |
There was a problem hiding this comment.
The enhancement acknowledges that RHEL 10 AMIs exist in only 2 AWS regions and Azure has no marketplace images. However, it doesn't specify the user-facing behavior when someone requests osImageStream: rhel-10 in a region or platform where the boot image doesn't exist.
A few questions:
- Does the NodePool controller reject the request via
ValidPlatformImage=False? Does it fall back to rhel-9 silently? Or does it proceed and fail at machine provisioning time? - Which platforms are actually supported for rhel-10 in TechPreview? AWS (2 regions only)? Azure (VHD only, no marketplace)? GCP? KubeVirt? The enhancement should explicitly list which platform+region combinations are supported and what happens for unsupported ones.
- For disconnected environments: if dual-stream payloads carry additional images, customers need to mirror both streams. The enhancement should document what additional content needs to be mirrored (two sets of boot images + two sets of node OS container images) and whether IDMS handles this transparently.
There was a problem hiding this comment.
ValidPlatformImage=False
yes, this continues current behaviour and signaling. When a boot image doesn't exist for a stream we fail. @yuqi-zhang can you help hcp align with mco/installer expectations/ux on standalone? Should we instead fallback when a boot image is not available for a stream and assume that the appropriate os container image will be discovered and succeed to install via image-references?
| The MCO's `GetBuiltinDefaultStreamName()` already returns a version-based default. HyperShift could skip generating the OSImageStream CR and let the MCO choose. | ||
|
|
||
| This was rejected because: | ||
| - It removes per-NodePool stream control — all NodePools would get the same default stream |
There was a problem hiding this comment.
The test plan lists 5 E2E scenarios, several of which create HostedClusters with dual-stream NodePools. A couple of concerns on CI cost:
- Each test with two NodePools (rhel-9 + rhel-10) roughly doubles the node count per test. What's the estimated overhead in time and infrastructure per CI run?
- HyperShift doesn't have a dedicated TechPreview CI job — tests that need TechPreview features simply create a HostedCluster with
FeatureSet: TechPreviewNoUpgrade(as seen increate_cluster_test.go). These new tests would run alongside existing E2E tests in the same jobs. What's the impact on overall job duration? - Given RHEL 10 AMIs are limited to 2 AWS regions, the tests are geographically constrained. Do the current CI jobs run in those regions?
This doesn't block the enhancement, but the team should have visibility into the CI cost before approving.
There was a problem hiding this comment.
hypershift has a techpreview job and also can set any HC to be techpreview itself.
This validations would require about ~6 additional ec2 instances per run.
Update the section to clarify all this.
|
|
||
| 1. Administrator upgrades a HostedCluster from 4.19 to 5.0. An existing NodePool has no `spec.osImageStream` (would default to `rhel-10`) but has a MachineConfig setting `default_runtime = "runc"`. | ||
| 2. NodePool controller's `validMachineConfigCondition` calls `getRHELStream()` which detects runc and returns `"rhel-9"` (fallback). | ||
| 3. The condition is set to `NodePoolValidMachineConfigConditionType=True` with message "OS stream defaulted to rhel-9: RHEL 10 is incompatible with default_runtime=runc". The NodePool stays on RHEL 9. |
There was a problem hiding this comment.
The enhancement routes stream validation errors through NodePoolValidMachineConfigConditionType. This works technically — getRHELStream() errors propagate through NewConfigGenerator() and the existing condition handler catches them.
However, for SRE observability, it might be worth having a separate condition (e.g., ValidOSStream) so that monitoring dashboards can distinguish "malformed MachineConfig" from "incompatible OS stream selection" without parsing the condition message. This is especially relevant for ROSA HCP and ARO HCP where SRE teams monitor conditions at fleet scale.
I'd suggest validating this with the ROSA and ARO SRE teams to see if a dedicated condition would help their workflows.
| name: "rhel-10" | ||
| │ | ||
| ▼ | ||
| Resolve stream ────────▶ Token Secret |
There was a problem hiding this comment.
Small correction on the pipeline description: copyMCOOutputToMCC() is not a separate top-level step in GetPayload(). Looking at local_ignitionprovider.go, it's called inside runMCO() at line 454, not as an independent step between runMCO() and runMCC().
The actual sequence in GetPayload() is:
1. runFeatureGateRender() — line 735
2. runMCO() — line 740
└── copyMCOOutputToMCC() — line 454 (inside runMCO)
3. [NEW: write 99_osimagestream.yaml to mccDir]
4. runMCC() — line 745
5. runMCSAndFetchPayload() — line 750
The injection point between runMCO() and runMCC() (lines 743-745) is correct. Just the diagram that should reflect copyMCOOutputToMCC is internal to runMCO, not a standalone step.
|
|
||
| 7. **Pass stream through `IgnitionProvider` interface** — add `osStream string` parameter to `GetPayload()` in the `IgnitionProvider` interface (`tokensecret_controller.go`). The `TokenSecretReconciler` reads `os-stream` from the token secret data and passes it to `GetPayload()`. Update the mock `IgnitionProvider` in tests accordingly. | ||
|
|
||
| ```go |
There was a problem hiding this comment.
This is a critical implementation invariant that I think deserves more emphasis in the enhancement:
The hash must use the value from spec.osImageStream.name directly (empty string when the field is unset), never the resolved value from getRHELStream(). If someone accidentally puts the resolved default (e.g., "rhel-10" for implicit >=5.0) into the hash input, it would change the hash for every existing NodePool without explicit osImageStream set, triggering a mass rollout of all nodes across the fleet.
In Go, "abc" + "" == "abc", so appending an empty string to the hash input is safe — the hash stays identical. But appending a non-empty resolved default would break this property.
I'd suggest documenting this explicitly as a design invariant, something like: "The rhelStream field in rolloutConfig MUST be populated from spec.osImageStream.name, never from the return value of getRHELStream(). This ensures zero-impact on existing NodePools."
| - Boot image ConfigMap parsing supports multi-stream format. | ||
| - Stream plumbed through token secret to ignition server. | ||
| - OSImageStream CR generated in `GetPayload()`. | ||
| - MCO ExternalTopologyMode guard removed (coordinated with MCO team). |
There was a problem hiding this comment.
The graduation criteria describe TechPreview first, then GA, but don't specify a timeline. A couple of questions:
- Is TechPreview targeting 5.0? If so, when does GA land?
- If GA targets 5.0 as well (no TechPreview phase), then full E2E coverage is needed before that release. The current test plan only covers TechPreview CI jobs.
- The NodePool API field promotion must be coordinated with the MachineConfigPool
OSStreamsfeature gate promotion in openshift/api — is there alignment on timeline with the MCO team?
|
|
||
| ### Architecture Overview | ||
|
|
||
| The OS stream flows through three layers: |
There was a problem hiding this comment.
The enhancement mentions in Non-Goals that "Karpenter NodePools always use the version-derived default stream" but doesn't elaborate on how this works in practice.
Looking at the current code, OpenshiftEC2NodeClass has no field for OS stream selection. setKarpenterAMILabels() in token.go resolves a single AMI per architecture from StreamMetadata with no stream dimension. The in-memory NodePool created by KarpenterIgnitionReconciler.createInMemoryNodePool() doesn't carry spec.osImageStream.
Could you add a brief "Karpenter" subsection that clarifies:
- How the version-derived default stream flows through the Karpenter pipeline (from release image → in-memory NodePool →
setKarpenterAMILabels→ AMI labels → EC2NodeClass)? - Whether
OpenshiftEC2NodeClasswill gain a stream field in a future phase, or if this is explicitly a non-goal? - Whether the AMI label scheme (
hypershift.openshift.io/ami,hypershift.openshift.io/ami-arm64) needs any changes for dual-stream, given it currently assumes one AMI per architecture.
| - No manual action required. | ||
|
|
||
| **Downgrade to a version without this feature:** | ||
| - `spec.osImageStream` is ignored by older controllers. NodePools revert to the single-stream behavior (always using the `rhel-coreos` image from the release payload). |
There was a problem hiding this comment.
This section covers upgrade/downgrade for the feature itself but doesn't address what happens when spec.osImageStream changes on a NodePool with upgradeType: InPlace.
For Replace NodePools this is straightforward — a new hash triggers a MachineDeployment rollout, old nodes are replaced with new ones booting the correct stream. But for InPlace NodePools, changing the stream would mean the MCD on the running node attempts an rpm-ostree rebase from RHEL 9 to RHEL 10 (or vice versa) — which is a cross-major-version OS rebase on a live node.
Looking at the current codebase:
- The in-place upgrader (
inplaceupgrader.go) has no rollback mechanism. If a node entersMachineConfigDaemonStateDegraded, the upgrade stops but does not revert previously upgraded nodes. (Grep for "rollback"/"revert"/"undo" in the inplaceupgrader directory returns zero results.) - E2E tests for in-place upgrades (
nodepool_upgrade_test.go,nodepool_lifecycle_test.go) test forward version upgrades only. No test exercises cross-RHEL-version changes or failure/degradation scenarios. - The Agent platform has no restrictions on upgrade type — Agent NodePools can be InPlace. In self-managed baremetal deployments, InPlace is common because re-provisioning hardware is expensive.
This means a baremetal operator could change osImageStream on an InPlace NodePool, triggering an OS rebase across all metal nodes with no rollback path if something goes wrong.
I'd suggest the enhancement explicitly addresses InPlace NodePools. Options:
- Reject
osImageStreamchanges on InPlace NodePools via validation (safest for TechPreview) - Document it as supported-but-risky with a clear warning in the condition message
- Explicitly list it as a non-goal for the initial implementation
Also worth noting: for disconnected/air-gapped baremetal deployments, dual-stream means customers need to mirror both streams' container images and boot artifacts. The enhancement should document what additional content needs to be mirrored.
There was a problem hiding this comment.
version downgrades are not supported. Clarified. Day 2 changes to the streams are supported as they are also in the canonical machineConfigPool API.
There was a problem hiding this comment.
@yuqi-zhang does mco does dualstream UX really supports moving back and forward inplace between rhel 9 and 10?
|
|
||
| 1. Update `DeserializeImageMetadata` (`support/releaseinfo/deserialize.go`) — parse the `streams` key when present, fall back to `stream` for older payloads. Return stream-indexed metadata. | ||
| 2. Update `CoreOSStreamMetadata` (`support/releaseinfo/releaseinfo.go`) — support multiple streams (map of stream name to per-arch metadata). | ||
| 3. Wire default boot image resolution per platform — accept stream parameter and look up stream-specific boot images from the parsed metadata: |
There was a problem hiding this comment.
Small verification question: the enhancement says the NodePool controller infers the OS stream from CAPI Machine.Status.NodeInfo.OSImage. The current code in nodeVersionsFromMachines() (version.go) reads KubeletVersion from Machine.Status.NodeInfo but doesn't read OSImage today.
CAPI should copy NodeInfo (including OSImage) from the guest Node to the Machine status, so this should work. But has this been verified across all platforms (AWS, Azure, Agent, KubeVirt)? If any platform's CAPI provider doesn't populate Machine.Status.NodeInfo.OSImage, the status reporting would silently fail.
There was a problem hiding this comment.
this is not a provider specific thing, it's core capi bubbling up nodeinfo, that's a contractual boundary we can consider consumable.
|
/label tide/merge-method-squash |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix pipeline diagram: copyMCOOutputToMCC() is internal to runMCO(), not a standalone step - Add boot image error behavior for missing stream+region (NodePoolValidPlatformImageType=False) - Add Karpenter step (phase 1) clarifying version-derived default flow - Add design invariant for rolloutConfig.rhelStream hash safety - Rewrite E2E section: 8 NodePools in TestNodePool suite covering explicit/implicit streams, validation, runc, and upgrade stream switches (Replace + InPlace) - Add disconnected mirroring risk (IDMS/ITMS handles both streams transparently) - Downgrade section: downgrades are not supported Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c413be6 to
97df15f
Compare
|
@enxebre: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| 4. **Mixed-stream clusters.** A HostedCluster with RHEL 9 and RHEL 10 NodePools is a new topology. Component compatibility across RHEL versions must be validated. *Mitigation:* This is the same topology supported by standalone clusters via per-MachineConfigPool stream selection. | ||
|
|
||
| 5. **Disconnected environments require mirroring both streams.** Dual-stream payloads carry two sets of node OS container images (one per RHEL stream). Disconnected customers using both streams must mirror both sets. Boot images are platform-specific and already handled outside the payload (e.g. pre-uploaded AMIs, VHDs). The OS container images are referenced via `ImageDigestMirrorSet` / `ImageTagMirrorSet` — existing IDMS/ITMS mirroring workflows handle this transparently as long as both stream images are included in the mirror list. No additional HyperShift-specific mirroring tooling is needed. |
There was a problem hiding this comment.
I can't supply hard values but if the size of two streams is critical in disconnected we should eventually double check with ROSA people for ECR size limit for their zero egress config.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sdminonne The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
csrwng
left a comment
There was a problem hiding this comment.
Looking good. Just a question after a brief look
|
|
||
| ## Open Questions | ||
|
|
||
| 1. Should `spec.osImageStream` be immutable after creation (requiring NodePool replacement to change streams), or mutable (triggering a rolling update)? The current proposal allows mutation with rollout, mirroring how other NodePool spec changes work. |
There was a problem hiding this comment.
This makes sense for replace strategy. For in-place, mutation would only make sense if you're moving forward (rhel9 -> rhel10), but not the other way, no?
Summary
spec.osImageStreamon NodePool API for per-NodePool RHEL stream selection (rhel-9 / rhel-10)streamskey)NodePoolValidMachineConfigConditionTypeTest plan
getRHELStream(), config hash behavior, runc detection, OSImageStream CR generation🤖 Generated with Claude Code