Skip to content

OCPEDGE-2280: mutable topology#2008

Open
jeff-roche wants to merge 2 commits into
openshift:masterfrom
jeff-roche:mutable-topology
Open

OCPEDGE-2280: mutable topology#2008
jeff-roche wants to merge 2 commits into
openshift:masterfrom
jeff-roche:mutable-topology

Conversation

@jeff-roche
Copy link
Copy Markdown

@jeff-roche jeff-roche commented May 11, 2026

Summary

Introduces the Mutable Topology enhancement proposal, which enables OpenShift clusters to transition between topology modes as a Day 2 operation. This replaces the previous Adaptable Topology proposal.

Key Design Decisions

  • Controller in cluster-config-operator (CCO) — A new topology transition controller in CCO watches spec.desiredTopology on the Infrastructure CR, validates preconditions, coordinates the transition across operators, and updates topology status fields when complete. CCO was chosen over CVO, CEO, and MCO (and over a standalone operator) because it owns the config.openshift.io API group and the Infrastructure CR lifecycle. See Alternatives in the proposal for the full placement analysis.
  • No new topology enum values — Transitions move between existing TopologyMode values (SingleReplica, HighlyAvailable, etc.). Operators continue reacting to fixed topology values they already understand. Transition complexity is concentrated in a single controller rather than distributed across 30+ operators.
  • Spec/status contract — Follows the standard Kubernetes pattern: spec.desiredTopology expresses administrator intent; status.controlPlaneTopology reflects observed state. Mirrors the oc adm upgrade pattern (patch spec, controller does the work).
  • Feature-gatedMutableTopology gate progresses through DevPreview → TechPreview → GA. Controller is not registered when the gate is disabled (zero runtime overhead).

Scope

  • Initial transition: SNO → HA compact (3-node) on platform: none
  • CLI: oc adm transition topology HighlyAvailable
  • Admission control: CEL validation on desiredTopology; ValidatingAdmissionPolicy (fail-closed) protects topology status fields from direct edits outside CCO
  • etcd scaling: CEO handles sequential 1→2→3 member scaling via existing learner-to-voter promotion
  • Failure handling: Controller resets desiredTopology on failure (deliberate spec mutation to prevent infinite retry loops); CEO attempts etcd rollback
  • Upgrade safety: CCO sets Upgradeable=False while a transition is in progress

What Changed (Revision History)

The proposal was revised to base the controller in CCO rather than proposing a dedicated standalone operator (OTTO). Key changes from the prior revision:

  • Controller placement moved from a standalone operator to CCO, with full alternatives analysis (CVO, CEO, MCO, standalone operator, CLI-only)
  • Added ValidatingAdmissionPolicy for topology status field protection (fail-closed)
  • Added detailed failure handling: controller resets desiredTopology on failure with rationale for the spec-mutation deviation
  • Expanded graduation criteria with per-operator topology dependency matrix requirement
  • Added monitoring/telemetry requirements (Prometheus metrics, alerts) for GA graduation
  • Added Support Procedures section with team ownership, detection, and recovery procedures
  • Clarified etcd scaling risks: the 2-voter intermediate state is unique to Day 2 transitions (does not occur during bootstrapping)
  • Added Upgradeable=False enforcement during transitions to prevent concurrent upgrades

Out of Scope

  • Bidirectional transitions (HA → SNO)
  • HyperShift / hosted control planes
  • MicroShift
  • Automatic node provisioning
  • Cloud platforms (AWS, Azure, GCP) — design does not preclude future support
  • platform: baremetal — pending keepalived resolution

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from bn222 and cooktheryan May 11, 2026 19:46
@jeff-roche jeff-roche changed the title enhancements/topologies: mutable topology enhancement proposal OCPEDGE-2280: mutable topology enhancement proposal May 11, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 11, 2026

@jeff-roche: This pull request references OCPEDGE-2280 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

Summary

  • Introduces the Mutable Topology enhancement, replacing the previous Adaptable Topology proposal
  • Proposes a new optional payload operator (OTTO) to orchestrate topology transitions between existing fixed topology modes, rather than adding a new topology enum
  • Initial scope: SNO to HA compact (3-node) on platform: none

Test plan

  • markdownlint passes (markdownlint-cli2)
  • Reviewer feedback from control plane, API, and architecture teams
  • Template structure validated against guidelines/enhancement_template.md

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jeff-roche jeff-roche changed the title OCPEDGE-2280: mutable topology enhancement proposal OCPEDGE-2280: mutable topology May 11, 2026
@jeff-roche
Copy link
Copy Markdown
Author

Copy link
Copy Markdown
Contributor

@brandisher brandisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a "why" statement covering why a day 2, out-of-payload operator is the right choice for this. The CVO section towards the bottom hints at the why a bit but more explicit detail is needed.

With that in mind, I haven't reviewed the EP fully because I don't understand why this is the approach we're taking. The assessment of CVO seems very light and not enough to exclude that as a potential option to meet the goals.

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
- The CLI would need direct access to operator internals, violating separation of concerns
- Error recovery and retry logic is better suited to an operator's reconciliation loop

### Controller in CVO
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CVO the only option in the core operators where this might make sense?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded to include some other operators, none of which fit the bill in my opinion. This is an entirely new process and shoehorning it into another operator that wasn't designed for tackling this type of procedure seems irresponsible to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which operator handles adding nodes to clusters?

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from dgoodwin. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jeff-roche
Copy link
Copy Markdown
Author

jeff-roche commented May 12, 2026

I'm missing a "why" statement covering why a day 2, out-of-payload operator is the right choice for this. The CVO section towards the bottom hints at the why a bit but more explicit detail is needed.

With that in mind, I haven't reviewed the EP fully because I don't understand why this is the approach we're taking. The assessment of CVO seems very light and not enough to exclude that as a potential option to meet the goals.

@brandisher I've added a new paragraph under the ## Proposal header that explains the why. If you're looking for something specifically beyond what I added, I'd be happy to add some more detail

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Generated with Claude Code

There are significant portions of this proposal that assume behaviour of OpenShift that either doesn't exist, or doesn't work in the way proposed. I'm assuming here that this is hallucination of Claude?

The EP as it stands today doesn't actually make sense for implementation. It also doesn't align with what I thought we had agreed on the architecture call.

Has anyone tried to manually take a cluster and scale up and manually transition from a single replica to multiple replicas? IMO this is the most important next step for this project

What I thought we had agreed:

  • To scale from SNO to HA, the user must create two new control plane nodes and join them to the cluster
    • On HighlyAvailable topology - KAS, KCM, etcd, etc all get scheduled automatically as static pods on these nodes - I don't see anything that prevents this based on if it's a SNO cluster today, this needs to be checked (it probably should)
    • MCO still serves ignition for control plane nodes on SNO, so user needs to create the control plane nodes somehow to ignite from here
  • New fields are added to the infrastructure spec to allow the user to say "I intend for this cluster to be HA going forward"
  • A controller is added to cluster config operator
    • This checks that the precondition of having additional control plane nodes in the cluster is met
    • Once the precondition is met, it updates the status to reflect spec
  • Operators now react to the change in status and transition from single to HA
    • etcd operator promotes learners to full members, quorum goes from 1->3 (I don't know if this guard is in place today, we should add if not)
    • KAS/KCM - no change, it already scheduled new KAS/kCM pods
    • Others - Those that previously deploy a single replica of their operand now move to 2 replicas, other changes might be needed on a per operator basis, I was expecting those details in the EP but don't see them yet

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
- Installed either manually or via the `oc adm transition topology` command
- Owns the transition graph — the directed graph defining which topology transitions are supported
- Owns the validation criteria for each transition (required nodes, certificates, secrets, operator states)
- Orchestrates transitions by interacting with cluster operators via their existing APIs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this actually exists

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is just objectively incorrect. :( I can see why you'd be confused reading this.
This never happens. What we can do is look to update things like ingress and console to be more adaptable like etcd/api-server such that they update their replicas when more infrastructure nodes become available and do firmer pre-flight checks so that the "transition" piece becomes a no-op, but I think it's OK for some operators to continue to treat the topology field as the source of truth for desired behavior.

An alternative would be to key off the infrastructure topologies "desiredTopology" and update the hooks for ingress to try to update it's replicas when it detects an update to that field. Then the pre-flight checks actually verify that has the right number of replicas and we update the topology after it's already succeeded. i guess it depends on whether we're treating the topology field or the desired topology field as the answer to "what should the operator being doing right now".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's OK for some operators to continue to treat the topology field as the source of truth for desired behavior.

Absolutely.

In an ideal world, most operators would not scale up their operands until the status toplogy fields were updated. We know that's not true today but I don't think we necessarily need to fix most of the controllers. The one controller that does concern me is etcd operator. Would be good to understand why it acts the way it does today (will just scale up and add the member to quorum on SNO) and whether there's a way we can change that behaviour so that it would treat new members as learners until the status toplogy transitions

@jeff-roche BTW can we get rid of the objectively incorrect statements at some point please

Comment thread enhancements/topologies/mutable-topology.md Outdated

#### Risk: Platform Bare Metal May Not Support Single-Node Clusters

**Risk**: If keepalived networking cannot be enabled, `platform: baremetal` will be limited to 2+ nodes, reducing the value of mutable topology for this platform.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limited to 2+? Isn't that the success criteria?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baremetal platform doesn't support SNO because having a load balancer for 1 node doesn't make sense.
In order for users to get the benefits of having not having to manually deploy a load balancer (i.e. what they primarily save in terms of effort when deploying on platform: baremetal), we need to investigate if we can allow baremetal as a platform for SNO first (which loadbalancing disabled), and change that operator so that loadbalancing can be introduced post-transition.

Otherwise we need to introduce a new, scarier feature: platform transitions.
That a pandora's box I don't want to look at.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That a pandora's box I don't want to look at.

You and me both

So are we tying this EP to not only supporting topology transitions, but also SNO on baremetal? I would have expected a SNO on baremetal project to be sufficiently large and warrant its own EP?

- Error recovery and retry logic is better suited to an operator's reconciliation loop than imperative CLI code
- The CLI would need direct access to operator internals, violating separation of concerns

### Extending an Existing Core Operator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or cluster config operator which would make a very natural home for this as long as we have commitment of ownership from folks writing the new controller

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At risk of going on a tangent, currently the installer has problems calculating topology when laying down manifests. We have a bug for this in the backlog, and I left #1905 (comment) on the previous enhancement.

I would like to see that calculation moved to the cluster config operator in bootkube during bootstrapping. That solution could co-exist with this one (and my team will push it forward as priorities allow); but it could potentially also tie into this solution.

Copy link
Copy Markdown
Contributor

@jaypoulz jaypoulz May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm find with us using CCO for this. I will take the blame for miscommunicating this to Jeff - it didn't strike me as obvious that a controller for this transition would obviously belong there. My instincts were that new code in the core operators is expensive, especially for a controller that doesn't need to be running 99% of the time. That said, I think it's fine for this to be a controller that is installed with zero replicas and the replicas are scaled-up during transition events. That fits them main sentiment of what Jeff and I were trying to solve - minimizing the tax on clusters that will never use this feature (i.e. the vast majority of them).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the solution for the installer is to get the user to specify their intent in the install-config (this should be passed through to the cluster). This enhancement is a good opportunity to define what that input should look like.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving this thread as I've re-scoped this to be a new CCO controller

Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the scope is limited to baremetal/platform:none, but I know there is interest for mutable topologies in cloud platforms as well so as much as appropriate I would to ensure the design leaves a path forward for those cloud platforms.

Also, like the other enhancement I don't see any mention of mastersSchedulable which affects the calculation for infrastructureTopology. How is the mastersSchedulable field handled/taken into account for this solution?

Comment thread enhancements/topologies/mutable-topology.md Outdated
- Error recovery and retry logic is better suited to an operator's reconciliation loop than imperative CLI code
- The CLI would need direct access to operator internals, violating separation of concerns

### Extending an Existing Core Operator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At risk of going on a tangent, currently the installer has problems calculating topology when laying down manifests. We have a bug for this in the backlog, and I left #1905 (comment) on the previous enhancement.

I would like to see that calculation moved to the cluster config operator in bootkube during bootstrapping. That solution could co-exist with this one (and my team will push it forward as priorities allow); but it could potentially also tie into this solution.

Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks directionally correct 👍


##### Pre-Transition

1. The cluster administrator prepares the additional control-plane nodes (hardware, network, OS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 'OS' here imply that the user joins the hosts to the cluster as as control plane nodes at this stage? If not, at what stage is that expected to happen?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means? Does this mean just prepping the HW is inplace? Or does this mean adding the node as a worker node to the cluster?
That would have the benefit that we can rely on all the existing docs and procedures on how to add a worker node to an existing cluster.

OTTO maintains a directed graph of supported transitions. For the initial implementation:

```text
SingleReplica (SNO, platform: none) → HighlyAvailable (3-node compact)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a mistake to define the supported topologies in terms of the controlPlaneTopology field. There are at least 6 use cases I can think of that users have articulated:

  • single-node (1 schedulable control plane, 0+ workers, no load balancer)
  • compact (3 schedulable control plane, 0+ workers)
  • standby (3 non-schedulable control plane, 0 workers)
  • HA (3 non-schedulable control plane, 2+ workers)
  • TNA (2 non-schedulable control plane, 1 arbiter, 2+ workers)
  • TNF (2 schedulable control plane w/ STONITH, 0 workers)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded the detail around CP and infra topology, as well as some validation rules around number of workers. For the first pass, we will report an error prior to transitioning if there are any worker nodes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to this expansion? I have the same question as Zane still having re-read the EP. This IMO needs more expansion unless I missed a section


The initial implementation targets `platform: none` clusters. On `platform: none`, the administrator is responsible for managing their own load balancing configuration (VIPs, DNS) when scaling beyond a single node.

`platform: baremetal` support is planned for a subsequent phase. Bare metal networking uses keepalived for ingress load balancing, which is not useful and creates a point of failure for SNO deployments. The Bare Metal Networking team will be consulted to determine if this networking setup can be enabled for single-node clusters transitioning to HA.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird that we are going to add single-node support to platform:baremetal just so that we can say we are not preventing it from later transitioning to HA.
Who is asking for this?

I would prefer that any effort from the on-prem networking team were instead directed toward adding optional on-prem networking to platform:external.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in a previous comment, its crucial to get support for "plaform:baremetal" and keelaived load balancing (for ingress AND API) in the medium term. We should validate that there is no technical obstacle and this can be added in the next release. Rational: at the edge, there hardly is an external load balancer available.

Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md
10. OTTO updates the Infrastructure status fields:
- `controlPlaneTopology` transitions from `SingleReplica` to `HighlyAvailable`
- `infrastructureTopology` transitions from `SingleReplica` to `HighlyAvailable`
11. Operators reconcile against the new topology values and adjust their deployment strategies, replica counts, and placement policies
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to try to e.g. restart OLM operators (which previously have treated the topology as fixed)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a view into how many/which olm operators are reading this value? Are they reading it at startup, or watching the resource? The expected pattern would be that the operator sees the change, and then reacts by updating the operand (e.g. scaling from 1 to 2 replicas now that it's been told the cluster is HA)

| cluster-etcd-operator | Coordinate with OTTO for sequential etcd scaling during transitions |
| Ingress, networking, monitoring operators | Respond to OTTO coordination signals during transitions; reconcile on Infrastructure config changes |

#### Platform Support Constraints
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention that IBI clusters cannot be converted from SNO (and have some mechanism for preventing that).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the technical blocker there?

Comment thread enhancements/topologies/mutable-topology.md Outdated
- Error recovery and retry logic is better suited to an operator's reconciliation loop than imperative CLI code
- The CLI would need direct access to operator internals, violating separation of concerns

### Extending an Existing Core Operator
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the solution for the installer is to get the user to specify their intent in the install-config (this should be passed through to the cluster). This enhancement is a good opportunity to define what that input should look like.

@jeff-roche
Copy link
Copy Markdown
Author

jeff-roche commented May 15, 2026

Big update coming next week to realign this with CCO instead of a dedicated operator, add some more technical detail around the flow, and address masters schedulable. Thank you everyone for the quick and thorough reviews, I believe we are rapidly converging on a solid solution!

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated

##### Pre-Transition

1. The cluster administrator prepares the additional control-plane nodes (hardware, network, OS)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means? Does this mean just prepping the HW is inplace? Or does this mean adding the node as a worker node to the cluster?
That would have the benefit that we can rely on all the existing docs and procedures on how to add a worker node to an existing cluster.

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
4. CEO promotes the learner to a voting member — the cluster now has 2 voting members (quorum=2)
5. CEO adds an etcd learner on the third control-plane node
6. The learner syncs data from an existing voter
7. CEO promotes the learner to a voting member — the cluster now has 3 voting members (quorum=2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
7. CEO promotes the learner to a voting member — the cluster now has 3 voting members (quorum=2)
7. CEO promotes the learner to a voting member — the cluster now has 3 voting members (quorum=3)


The initial implementation targets `platform: none` clusters. On `platform: none`, the administrator is responsible for managing their own load balancing configuration (VIPs, DNS) when scaling beyond a single node.

`platform: baremetal` support is planned for a subsequent phase. Bare metal networking uses keepalived for ingress load balancing, which is not useful and creates a point of failure for SNO deployments. The Bare Metal Networking team will be consulted to determine if this networking setup can be enabled for single-node clusters transitioning to HA.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in a previous comment, its crucial to get support for "plaform:baremetal" and keelaived load balancing (for ingress AND API) in the medium term. We should validate that there is no technical obstacle and this can be added in the next release. Rational: at the edge, there hardly is an external load balancer available.

- The 2-member state is transient and follows the same sequential pattern used during cluster bootstrapping — a well-exercised code path
- Learner instances are used before promoting members to minimize the promotion window
- No availability guarantee during transitions; administrators should treat scaling operations as a maintenance window
- CEO will attempt rollback if scaling fails (e.g., rollback to 1 member if the 1→2→3 scale-up fails partway through)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the loss of quorum=2 was created by a split brain situation? Will both etcd attempt rollback to 1? This could lead to two individual clusters of one. I would be fine with specifing a simple heuristic to resolve this situation, e.g. dropping the younger etcd instance in favour of the older one or something like that. Or maybe a special command for the admin to resolve this situation

- Learner instances are used before promoting members to minimize the promotion window
- No availability guarantee during transitions; administrators should treat scaling operations as a maintenance window
- CEO will attempt rollback if scaling fails (e.g., rollback to 1 member if the 1→2→3 scale-up fails partway through)
- Future iterations may explore admitting two learners simultaneously and promoting only when both are ready, eliminating the 2-member voting window entirely but that is out of scope for this enhancement
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth adressing this directly, instead of dealing with the potential split brain situation from my previous comment?

Comment thread enhancements/topologies/mutable-topology.md Outdated
jeff-roche and others added 2 commits May 18, 2026 12:15
Introduce the Mutable Topology enhancement, which replaces the
previous Adaptable Topology proposal. Instead of a new topology
enum that all operators must interpret, this approach uses a
dedicated operator (OTTO) to orchestrate transitions between
existing fixed topology modes. Initial scope: SNO to HA compact
on platform: none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the topology transition controller from a standalone operator
(OTTO) into cluster-config-operator. CCO owns the config.openshift.io
API group and infrastructure CR lifecycle, making it the natural home.

Key design decisions:
- desiredTopology initialized by installer to match controlPlaneTopology
  (no kubebuilder default — value is cluster-specific)
- Controller triggers on desiredTopology != status.controlPlaneTopology
- On failure, controller resets desiredTopology to current topology
- Upgrade blocked via Upgradeable=False during transitions
- Condition types: TopologyTransitionProgressing, Completed, Failed
- Per-operator topology audit required for Dev Preview entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

@jeff-roche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

- Resolution: CEO should attempt automatic rollback. If rollback fails, follow standard etcd disaster recovery procedures.

### Recovery Procedures

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this transition, will backups scale ?

If I take a backup on SNO, wil it work on TNA or do I need to take a fresh/new backup ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought through backups. @jaypoulz have you given this any thought? My initial thought is you would need to do a new backup as I'm not sure of how we would scale the backup.

@dhensel-rh
Copy link
Copy Markdown
Contributor

Are there limitations for a SNO to TNF transition ? TNF requires BMC/Redfish so if the SNO bare metal hardware does not have it, does it block the transition? I could see this being a problem trying to match hardware in general (BMC firmware versions, vendor types, etc. ).

Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better than the previous iteration. I still fee like there's some disconnect between the new and old stuff, some stuff may still be hanging over from the previous iteration that doesn't quite make sense now, PTAL at my comments


This enhancement enables OpenShift clusters to transition between topology modes as a Day 2 operation. This changes the existing OpenShift assumption that topologies are immutable after installation.

A new `desiredTopology` field in the infrastructure spec expresses the administrator's intent to transition. A topology transition controller in cluster-config-operator watches for changes to this field, validates preconditions, coordinates the transition, and updates the existing topology status fields when the cluster is ready.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for infrastructure or control plane, or both?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair question. In my head this entire process is about control plane scaling. I think we already have the necessary mechanisms in place to scale workers, right?

This enhancement enables OpenShift clusters to transition between topology modes as a Day 2 operation. This changes the existing OpenShift assumption that topologies are immutable after installation.

A new `desiredTopology` field in the infrastructure spec expresses the administrator's intent to transition. A topology transition controller in cluster-config-operator watches for changes to this field, validates preconditions, coordinates the transition, and updates the existing topology status fields when the cluster is ready.
A new `oc adm transition topology` CLI command provides an interface for cluster administrators to initiate transitions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common addition to the CLI? I have nothing against extending the CLI, but do question if it is strictly required

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not strictly required, this is more of a usability thing. In theory a cluster admin could go in and update the desired topology and manually monitor progress but that might feel disconnected. Through the CLI we could give some structure to the process


A new `desiredTopology` field in the infrastructure spec expresses the administrator's intent to transition. A topology transition controller in cluster-config-operator watches for changes to this field, validates preconditions, coordinates the transition, and updates the existing topology status fields when the cluster is ready.
A new `oc adm transition topology` CLI command provides an interface for cluster administrators to initiate transitions.
The initial implementation supports transitioning Single Node OpenShift (SNO) clusters to HA compact (3-node) on `platform: none`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping to see somewhere a documented reason for why we are only considering platform none

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, I think this is covered a couple times in this doc but I can find a more explicit place to mention the reasoning


This enhancement introduces a new infrastructure API field and a topology transition controller in cluster-config-operator (CCO; not to be confused with cloud-credential-operator) to enable topology transitions as Day 2 operations.

The approach follows the standard Kubernetes spec/status contract and mirrors the pattern used by `oc adm upgrade`:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more an openshift thing rather than a kube thing this pattern

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will update wording


3. **`oc adm transition topology` CLI command** — A command that validates preconditions before patching `spec.desiredTopology` on the infrastructure CR, then monitors transition progress.

The transition controller is proposed to live in cluster-config-operator because CCO is the canonical location for config.openshift.io CRD manifests and bootstrap CR rendering, and the topology transition logic is tightly coupled to the Infrastructure CR schema it ships. This is a deliberate expansion of CCO's scope since historically the repo has been limited to CRD manifests and bootstrap rendering. The controller is feature-gated using the standard library-go FeatureGateAccess pattern: when the gate is disabled the controller is not registered with the manager and incurs negligible runtime overhead; a gate change triggers an operator restart via ForceExit so the new state is picked up cleanly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the repo has been limited to CRD manifests and bootstrap rendering

This is not really true, but also doesn't materially affect what you're trying to say in this EP

TBH, this whole paragraph is fluff IMO

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good with me dropping it? It was a recommendation from chai-bot to add it and I figured it didn't hurt but agree it's fluff


#### Risk: Platform Bare Metal May Not Support Single-Node Clusters

**Risk**: If keepalived networking cannot be enabled, `platform: baremetal` will be limited to 2+ nodes, reducing the value of mutable topology for this platform.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That a pandora's box I don't want to look at.

You and me both

So are we tying this EP to not only supporting topology transitions, but also SNO on baremetal? I would have expected a SNO on baremetal project to be sufficiently large and warrant its own EP?


#### Risk: Cannot Validate External Requirements

**Risk**: On `platform: none`, the topology transition controller cannot validate external requirements such as correct load balancer configuration or DNS setup. An administrator may initiate a transition with misconfigured networking, leading to a partially functional cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time load balancers are mentioned. Is this still something we expect the CCO to validate? Feels like that's up to the admin to set up before they initiate the transition, and not something we should be caring about IMO

**Why it was rejected**:
- The scope does not warrant a new operator — cluster-config-operator is the natural home for this logic since it already owns the `config.openshift.io` API group and infrastructure CR lifecycle
- A standalone operator adds payload size, requires its own upgrade/lifecycle management, and introduces another component to monitor
- The transition controller can live in CCO with zero overhead when not in use, gated by the `MutableTopology` feature gate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: near zero


## Open Questions

1. **HyperShift considerations**: Since the scope has broadened from edge-specific deployments to changing the topology assumption for OpenShift as a whole, do we need to consider HyperShift support? Initial answer is no — this would be future work and require its own enhancement.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like an open question if it has an answer

| ---- | ----------- |
| Precondition validation | Verify controller rejects transitions with missing nodes, invalid platforms, or unsupported source topologies |
| CLI interaction | Verify `oc adm transition topology` correctly patches `spec.desiredTopology` and monitors progress |
| Feature gate gating | Verify the controller is inactive when `MutableTopology` feature gate is disabled |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API won't exist when the gate is disabled, so you won't be able to drive the controller even if it were running. I think this test is probably impossible if not superfluous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.