-
Notifications
You must be signed in to change notification settings - Fork 31
Fix invalid first appointments in Raft #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes incorrect initial leader appointments during Raft failover when a replicaset has fewer than three instances, and adds a test to verify leader selection history under Raft mode.
- Introduce
get_leaders_historyin the test service to capture applied leader snapshots. - Enhance the Raft failover logic to appoint a leader in small clusters on initial configuration.
- Replace disabled-mode appointment logic with the unified Raft
get_appointmentsand add integration tests.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/integration/raft_test.lua | Add raft_leaders_history test group and a new test |
| test/entrypoint/srv_basic.lua | Track and expose leaders_history in the test module |
| cartridge/failover/raft.lua | Insert fallback leader appointment for small clusters |
| cartridge/failover.lua | Use raft_failover.get_appointments in disabled mode |
| CHANGELOG.rst | Document “Invalid leader appointment in Raft failover” |
Comments suppressed due to low confidence (3)
cartridge/failover.lua:961
- Ensure that
raft_failoveris properly imported (e.g.local raft_failover = require('cartridge.failover.raft')) at the top of this file; otherwise this call will result in a nil-reference error.
first_appointments = raft_failover.get_appointments(topology_cfg)
test/integration/raft_test.lua:345
- Verify that the
Clusterhelper defineswait_until_healthy(). If not, calling this method will raise an undefined-method error in the test.
g.cluster:wait_until_healthy()
test/entrypoint/srv_basic.lua:35
- [nitpick] The
leaders_historytable persists across service restarts and tests. Consider clearing it in astoporinithook to avoid cross-test contamination.
local leaders_history = {}
| appointments[vars.replicaset_uuid] = vars.leader_uuid | ||
| -- If there is still no leader in the replicaset, appoint the first | ||
| -- electable instance as a leader. | ||
| for replicaset_uuid, _ in pairs(replicasets) do |
Copilot
AI
Jun 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold ‘3’ for small clusters is a magic number—consider extracting it into a named constant (e.g. MIN_RAFT_REPLICASET_SIZE) or config parameter for clarity and easier adjustments.
cartridge/failover/raft.lua
Outdated
| if appointments[replicaset_uuid] == nil then | ||
| local leaders = topology.get_leaders_order( | ||
| topology_cfg, replicaset_uuid, nil, {only_electable = false, only_enabled = true} | ||
| ) | ||
| if vars.disable_raft_on_small_clusters and #leaders < 3 then | ||
| appointments[replicaset_uuid] = leaders[1] | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need another test for this case
cartridge/failover/raft.lua
Outdated
| -- If there is still no leader in the replicaset, appoint the first | ||
| -- electable instance as a leader. | ||
| for replicaset_uuid, _ in pairs(replicasets) do | ||
| if appointments[replicaset_uuid] == nil then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failover.disable doesn't change the membership payload.
If we disabled instance, raft_term and leader_uuid will remain the same.
If there were instances and Raft in the replicaset, the instances will keep the same raft_term and leader_uuid, and the old leader will remain as it was with Raft.
Therefore, it's better not to check for nil.
6d2137e to
5027c9d
Compare
4599156 to
887c75c
Compare
Reproducer: 1. Start a cluster with one router and a storage replicaset of three instances (storage-1 is the initial leader). 2. Call `box.ctl.promote()` on storage-2. 3. Wait until the router learns the new leader via membership. 4. Patch the cluster-wide config to trigger `failover.cfg`. 5. The router reverts to the old master because it is lexicographically first in the topology, mimicking ‘disabled’ mode. Fix: always build `first_appointments` via `raft_failover.get_appointments(topology_cfg)` so routers do not fall back to stale Raft information when the replicaset has fewer than three instances.
…ments in Raft mode If `disable_raft_on_small_clusters` is enabled and a replicaset contains fewer than three instances, treat it like ‘disabled’ failover and pick the first enabled non-eleactable instance as leader.
887c75c to
4532fb9
Compare
When Raft failover is enabled on replicasets with fewer than 3 instances (typically routers), a bug occurs in leader selection:
Leader is changed in the replicaset
Router learns about new leader via membership
Config update triggers failover.cfg
Router incorrectly reverts to the old leader (first lexicographical instance) as if in 'disabled' mode
This causes vshard to return NON_MASTER errors since wrong storage is elected as master.
I didn't forget about
Closes https://jira.vk.team/browse/TNT-1415