[v25.3.x] Decom can cancel simple node add raft0 reconfigurations#30661
Open
vbotbuildovich wants to merge 3 commits into
Open
[v25.3.x] Decom can cancel simple node add raft0 reconfigurations#30661vbotbuildovich wants to merge 3 commits into
vbotbuildovich wants to merge 3 commits into
Conversation
joe-redpanda
approved these changes
Jun 1, 2026
Collaborator
Author
Retry command for Build#85191please wait until all jobs are finished before running the slash command |
Collaborator
Author
Allows a decommission / node removal request to cancel the addition of a node to raft 0. Prior behavior was that it would wait for the node to transition from learner to voter, and then decommission would succeed. This could deadlock if the learner dies before finishing recovery. (cherry picked from commit 1f884b8)
Adds a configuration option which determines whether raft0 recovery should respect learner recovery rate. This is ill-advised for production but extremely helpful in testing for widening race condition windows on controller operations. Used in a subsequent commit. (cherry picked from commit 6ad1fe9)
819098a to
8c9abbb
Compare
Collaborator
Author
Retry command for Build#86182please wait until all jobs are finished before running the slash command |
... test Adds a regression test for deadlocked members_backend. When a node is added to the cluster, joins raft0 as a learner, and dies before it can transition to a voter, raft0 reconfigurations are blocked until this learner can recover. This prior required either the dead learner to recover, or node uuid override to unblock raft0 reconfiguration. This test validates the fix, which is that a decommission on a node which has not finished recovering (is a learner) should cancel the raft0 configuration as its 'node removal' step This allows a decommission to serve as the escape hatch when a raft0 learner has become irrevocably lost. (cherry picked from commit 8edf586)
8c9abbb to
adbfe60
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of PR #30377
Conflict details
operator<<(std::ostream&, partition_reallocation&)formatter while the source branch had migrated to aformat_tomember; placed the newcancel_raft0_adddefinition before the existingoperator<<and kept v25.3.x's formatter style.#include "security/acl.h"directly afterraft/fwd.h, the source commit inserts#include "raft/types.h"; kept both in alphabetical order.