Skip to content

Conversation

@davidporter-id-au
Copy link
Member

@davidporter-id-au davidporter-id-au commented Nov 1, 2025

What changed?

  • Unified the FailoverDomain Handler code with the UpdateDomain Failover path
  • Flattened out the path for failover for normal domains and active/active a bit
  • Added a few bits missing for the FailoverDomain hander in both the CLI and the mappers to support active/active

This is not a perfect refactor, I changed a few behaviours while I was here since I considered them low risk:

  • Stopped publishing active/active failover events to the Domain-data field, since this will be replaced by the FailoverHistory endpoint
  • Rewrote one of the unit tests for FailoverHandler for clarity

Why?

How did you test it?

  • Unit tests
  • Manual testing:
    • UpdateDomain
      • Active/passive domain: Failover
      • Active/Active
    • FailoverDomain
      • Active/Passive domain
      • Active/Active domain

Potential risks

Release notes

Documentation Changes

// as a historical backwards compatibility measure.
// Going forward, any history use-cases should rely on the FailoverHistory endpoint which
// supports all failover types and more than a handful of entries.
if !wasActiveActive && !isActiveActive {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behaviour change, not doing to put active/active updates in the domain-data since they will likely get too hard to read and be too large.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In long-term, are we removing failover history from data blob for active-passive domains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have an opinion. Probably yes, but some system parts have taken a dependency on it, so I'm not racing to do so

lastUpdatedTime,
response, err := d.handleFailoverRequest(
ctx,
failoverRequest.ToUpdateDomainRequest(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chose to map to the UpdateDomain request struct here rather than the reverse because the FailoverDomain API is actually a subset of the UpdateDomain failover capabilities, we didn't include any graceful failover fields for it, so it has to be this way

@davidporter-id-au davidporter-id-au force-pushed the refactoring/domain-handler-update-iii branch from 0375f5d to 5628316 Compare November 1, 2025 22:59
@davidporter-id-au davidporter-id-au changed the title Refactoring/domain handler update iii chore: Refactoring/domain handler update iii Nov 3, 2025
@davidporter-id-au davidporter-id-au changed the title chore: Refactoring/domain handler update iii chore: Refactoring/domain handler updates Nov 3, 2025
FailoverType string `json:"failoverType,omitempty"`

// active-active domain failover
FromActiveClusters types.ActiveClusters `json:"fromActiveClusters,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional removal, these are to be replaced by the FailoverHistory endpoint

}{
{
name: "Success case - global domain force failover via replication config",
name: "Success case - active/passive domain - global domain force failover - failing over from cluster A to cluster B",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote the test with different data to be less confusing

DomainName: t.DomainName,
DomainActiveClusterName: *t.DomainActiveClusterName,
DomainActiveClusterName: t.GetDomainActiveClusterName(),
ActiveClusters: FromActiveClusters(t.ActiveClusters),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug, mappers were missing

// as a historical backwards compatibility measure.
// Going forward, any history use-cases should rely on the FailoverHistory endpoint which
// supports all failover types and more than a handful of entries.
if !wasActiveActive && !isActiveActive {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In long-term, are we removing failover history from data blob for active-passive domains?

// Any parts of the system (domain-callbacks, domain cache etc) that watch for
// failover events and may trigger some action or cache invalidation will be watching
// the domain-level failver counter for changes. Therefore, by bumping it even for
// cases where it isn't changing, we ensure all these other subprocesses will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should find the subprocesses depending on this and extend them to support cluster-attribute-level failover instead of increasing the domain level failover version fall all failovers.
But I'm ok with the current state.

@davidporter-id-au davidporter-id-au merged commit 3958763 into cadence-workflow:master Nov 3, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants