[upgrade] Phase 3: HA replica-failover rolling upgrade (RPO=0, clustered)

Part of the upgrade epic, Phase 3. The RPO=0, zero-downtime upgrade for CLUSTERED deployments, using the built raft + replication.

Rolling node-by-node upgrade (the etcd/Consul/ElastiCache/Redis pattern): upgrade an in-sync REPLICA first (it catches up from the primary), then PROMOTE it (ownership moves only via a committed raft log + monotonic epoch - this fence IS synchronous promotion, so no acknowledged write is lost), then upgrade the old primary as a replica. Clients redirect on the failover; the dataset is never down.

Scope: an `ironcache upgrade` mode/flag that orchestrates the cluster-aware rolling upgrade (drive the replica upgrade via the spine, wait for in-sync, trigger the committed PromoteReplica, verify, upgrade the old primary); guardrails (refuse to promote a non-in-sync replica - the lag gate; require quorum). NOT applicable to the single-node prod box without first standing up a replica (raft mode is a boot-only cluster decision). Async replication has a small loss window bounded by the in-sync gate - document it.

Acceptance: in a raft cluster, an `ironcache upgrade` rolls the whole cluster to a new version with zero downtime and zero acknowledged-write loss, primary upgraded last. Depends on: the self-updater spine; prod/test running clustered.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[upgrade] Phase 3: HA replica-failover rolling upgrade (RPO=0, clustered) #392

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[upgrade] Phase 3: HA replica-failover rolling upgrade (RPO=0, clustered) #392

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions