Skip to content

Commit 8e38c32

Browse files
craig[bot]irfansharif
andcommitted
50329: kvserver,cli,roachtest,sql: introduce a fully decommissioned bit r=tbg a=irfansharif Ignore first commit, cherry-picked from @andreimatei's cockroachdb#49135. --- This commit introduces a fully decommissioned bit to Cockroach. Previously our Liveness schema only contained a decommissioning bool, with consequently no ability to disamiguate between a node currently undergoing decommissioning, and a node that was fully decommissioned. We used some combination of store dead threshold to surface, in our UI, "fully decommissioned" nodes, but it was never quite so. We need this specificity for the Connect RPC. We wire up the new CommissionStatus enum that's now part of the liveness record. In doing so it elides usage of the decommissioning bool used in v20.1. We're careful to maintain an on-the-wire representation of the Liveness record that will be understood by v20.1 nodes (see Liveness.EnsureCompatible for details). We repurpose the AdminServer.Decommission RPC (which is now a misnomer, should be thought of being named Commission instead) to persist CommissionStatus states to KV through the lifetime of a node decommissioning/recommissioning. See cli/node.go for where that's done. For recommissioning a node, it suffices to simply persist a COMMISSIONED state. When decommissioning a node, since it's a longer running process, we first persist an in-progress DECOMMISSIONING state, and once we've moved off all the Replicas in the node, we finalize the decommissioning process by persisting the DECOMMISSIONED state. When transitioning between CommissionStatus states, we CPut against what's already there, disallowing illegal state transitions. The appropriate error codes are surfaced back to the user. An example would be in attempting to recommission a fully decommissioned node, in which case we'd error out with the following: > command failed: illegal commission status transition: can only > recommission a decommissioning node; n4 found to be decommissioned Note that this is a behavioral change for `cockroach node recommission`. Previously it was able to recommission any "decommissioned" node, regardless of how long ago it's was removed from the cluster. Now recommission serves to only cancel an accidental decommissioning process that wasn't finalized. The 'decommissioning' column in crdb_internal.gossip_liveness is now powered by this new CommissionStatus, and we introduce a new commission_status column to it. We also introduce the same column to the output generated by `cockroach node status --decommission`. The is_decommissioning column still exists, but is also powered by the CommissionStatus now. We also iron out the events plumbed into system.eventlog: it now has a dedicated event for "node decommissioning". Release note (general change): `cockroach node recommission` has new semantics.Previously it was able to recommission any decommissioned node, regardless of how long ago it's was fully removed from the cluster. Now recommission serves to only cancel an accidental decommissioning process that wasn't finalized. Release note (cli change): We introduce a commission_status column to the output generated by `cockroach node status --decommission`. It should be used in favor of the is_decommissioning column going forward. Co-authored-by: irfan sharif <[email protected]>
2 parents 12cc4cb + c045ad8 commit 8e38c32

39 files changed

+2122
-1104
lines changed

docs/generated/settings/settings.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,6 @@
7272
<tr><td><code>trace.debug.enable</code></td><td>boolean</td><td><code>false</code></td><td>if set, traces for recent requests can be seen in the /debug page</td></tr>
7373
<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr>
7474
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set</td></tr>
75-
<tr><td><code>version</code></td><td>custom validation</td><td><code>20.1-10</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
75+
<tr><td><code>version</code></td><td>custom validation</td><td><code>20.1-11</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
7676
</tbody>
7777
</table>

pkg/cli/cliflags/flags.go

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -939,14 +939,16 @@ drain all active client connections and migrate away range leases.`,
939939
Wait = FlagInfo{
940940
Name: "wait",
941941
Description: `
942-
Specifies when to return after having marked the targets as decommissioning.
942+
Specifies when to return during the decommissioning process.
943943
Takes any of the following values:
944944
<PRE>
945945
946-
- all: waits until all target nodes' replica counts have dropped to zero.
947-
This is the default.
948-
- none: marks the targets as decommissioning, but does not wait for the
949-
process to complete. Use when polling manually from an external system.
946+
- all waits until all target nodes' replica counts have dropped to zero and
947+
marks the nodes as fully decommissioned. This is the default.
948+
- none marks the targets as decommissioning, but does not wait for the
949+
replica counts to drop to zero before returning. If the replica counts
950+
are found to be zero, nodes are marked as fully decommissioned. Use
951+
when polling manually from an external system.
950952
</PRE>`,
951953
}
952954

pkg/cli/demo_cluster.go

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ import (
2424

2525
"github.com/cockroachdb/cockroach/pkg/base"
2626
"github.com/cockroachdb/cockroach/pkg/cli/cliflags"
27+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb"
2728
"github.com/cockroachdb/cockroach/pkg/roachpb"
2829
"github.com/cockroachdb/cockroach/pkg/rpc"
2930
"github.com/cockroachdb/cockroach/pkg/security"
@@ -361,17 +362,17 @@ func (c *transientCluster) DrainAndShutdown(nodeID roachpb.NodeID) error {
361362
return nil
362363
}
363364

364-
// CallDecommission calls the Decommission RPC on a node.
365-
func (c *transientCluster) CallDecommission(nodeID roachpb.NodeID, decommissioning bool) error {
365+
// Recommission recommissions a given node.
366+
func (c *transientCluster) Recommission(nodeID roachpb.NodeID) error {
366367
nodeIndex := int(nodeID - 1)
367368

368369
if nodeIndex < 0 || nodeIndex >= len(c.servers) {
369370
return errors.Errorf("node %d does not exist", nodeID)
370371
}
371372

372373
req := &serverpb.DecommissionRequest{
373-
NodeIDs: []roachpb.NodeID{nodeID},
374-
Decommissioning: decommissioning,
374+
NodeIDs: []roachpb.NodeID{nodeID},
375+
TargetMembership: kvserverpb.MembershipStatus_ACTIVE,
375376
}
376377

377378
ctx, cancel := context.WithCancel(context.Background())
@@ -387,6 +388,52 @@ func (c *transientCluster) CallDecommission(nodeID roachpb.NodeID, decommissioni
387388
if err != nil {
388389
return errors.Wrap(err, "while trying to mark as decommissioning")
389390
}
391+
392+
return nil
393+
}
394+
395+
// Decommission decommissions a given node.
396+
func (c *transientCluster) Decommission(nodeID roachpb.NodeID) error {
397+
nodeIndex := int(nodeID - 1)
398+
399+
if nodeIndex < 0 || nodeIndex >= len(c.servers) {
400+
return errors.Errorf("node %d does not exist", nodeID)
401+
}
402+
403+
ctx, cancel := context.WithCancel(context.Background())
404+
defer cancel()
405+
406+
adminClient, finish, err := getAdminClient(ctx, *(c.s.Cfg))
407+
if err != nil {
408+
return err
409+
}
410+
defer finish()
411+
412+
// This (cumbersome) two step process is due to the allowed state
413+
// transitions for membership status. To mark a node as fully
414+
// decommissioned, it has to be marked as decommissioning first.
415+
{
416+
req := &serverpb.DecommissionRequest{
417+
NodeIDs: []roachpb.NodeID{nodeID},
418+
TargetMembership: kvserverpb.MembershipStatus_DECOMMISSIONING,
419+
}
420+
_, err = adminClient.Decommission(ctx, req)
421+
if err != nil {
422+
return errors.Wrap(err, "while trying to mark as decommissioning")
423+
}
424+
}
425+
426+
{
427+
req := &serverpb.DecommissionRequest{
428+
NodeIDs: []roachpb.NodeID{nodeID},
429+
TargetMembership: kvserverpb.MembershipStatus_DECOMMISSIONED,
430+
}
431+
_, err = adminClient.Decommission(ctx, req)
432+
if err != nil {
433+
return errors.Wrap(err, "while trying to mark as decommissioned")
434+
}
435+
}
436+
390437
return nil
391438
}
392439

pkg/cli/error.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,14 @@ func MaybeDecorateGRPCError(
268268
if pgcode.MakeCode(string(wErr.Code)) == pgcode.ProtocolViolation {
269269
return connSecurityHint()
270270
}
271+
272+
// Are we running a v20.2 binary against a v20.1 server?
273+
if strings.Contains(wErr.Message, "column \"membership\" does not exist") {
274+
// The v20.2 binary makes use of columns not present in v20.1,
275+
// so this is a disallowed operation. Surface a better error
276+
// code here.
277+
return fmt.Errorf("cannot use a v20.2 cli against servers running v20.1")
278+
}
271279
// Otherwise, there was a regular SQL error. Just report
272280
// that.
273281
return err

pkg/cli/interactive_tests/test_demo_node_cmds.tcl

Lines changed: 19 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,12 @@ eexpect "node 2 is already running"
3535
send "\\demo shutdown 3\r"
3636
eexpect "node 3 has been shutdown"
3737

38-
send "select node_id, draining, decommissioning from crdb_internal.gossip_liveness ORDER BY node_id;\r"
39-
eexpect "1 | false | false"
40-
eexpect "2 | false | false"
41-
eexpect "3 | true | false"
42-
eexpect "4 | false | false"
43-
eexpect "5 | false | false"
38+
send "select node_id, draining, decommissioning, membership from crdb_internal.gossip_liveness ORDER BY node_id;\r"
39+
eexpect "1 | false | false | active"
40+
eexpect "2 | false | false | active"
41+
eexpect "3 | true | false | active"
42+
eexpect "4 | false | false | active"
43+
eexpect "5 | false | false | active"
4444

4545
# Cannot shut it down again.
4646
send "\\demo shutdown 3\r"
@@ -55,33 +55,26 @@ eexpect "movr>"
5555
send "\\demo restart 3\r"
5656
eexpect "node 3 has been restarted"
5757

58-
send "select node_id, draining, decommissioning from crdb_internal.gossip_liveness ORDER BY node_id;\r"
59-
eexpect "1 | false | false"
60-
eexpect "2 | false | false"
61-
eexpect "3 | false | false"
62-
eexpect "4 | false | false"
63-
eexpect "5 | false | false"
58+
send "select node_id, draining, decommissioning, membership from crdb_internal.gossip_liveness ORDER BY node_id;\r"
59+
eexpect "1 | false | false | active"
60+
eexpect "2 | false | false | active"
61+
eexpect "3 | false | false | active"
62+
eexpect "4 | false | false | active"
63+
eexpect "5 | false | false | active"
6464

6565
# Try commissioning commands
6666
send "\\demo decommission 4\r"
6767
eexpect "node 4 has been decommissioned"
6868

69-
send "select node_id, draining, decommissioning from crdb_internal.gossip_liveness ORDER BY node_id;\r"
70-
eexpect "1 | false | false"
71-
eexpect "2 | false | false"
72-
eexpect "3 | false | false"
73-
eexpect "4 | false | true"
74-
eexpect "5 | false | false"
69+
send "select node_id, draining, decommissioning, membership from crdb_internal.gossip_liveness ORDER BY node_id;\r"
70+
eexpect "1 | false | false | active"
71+
eexpect "2 | false | false | active"
72+
eexpect "3 | false | false | active"
73+
eexpect "4 | false | true | decommissioned"
74+
eexpect "5 | false | false | active"
7575

7676
send "\\demo recommission 4\r"
77-
eexpect "node 4 has been recommissioned"
78-
79-
send "select node_id, draining, decommissioning from crdb_internal.gossip_liveness ORDER BY node_id;\r"
80-
eexpect "1 | false | false"
81-
eexpect "2 | false | false"
82-
eexpect "3 | false | false"
83-
eexpect "4 | false | false"
84-
eexpect "5 | false | false"
77+
eexpect "can only recommission a decommissioning node"
8578

8679
interrupt
8780
eexpect eof

pkg/cli/interactive_tests/test_multiple_nodes.tcl

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ eexpect eof
2020

2121
end_test
2222

23+
start_test "Check that a recommissioning an active node prints out a warning"
24+
spawn $argv node recommission 2
25+
eexpect "warning: node 2 is not decommissioned"
26+
eexpect eof
2327

2428
start_test "Check that a double decommission prints out a warning"
2529
spawn $argv node decommission 2 --wait none
@@ -30,13 +34,6 @@ eexpect "warning: node 2 is already decommissioning or decommissioned"
3034
eexpect eof
3135
end_test
3236

33-
start_test "Check that a double recommission prints out a warning"
34-
spawn $argv node recommission 2
35-
eexpect eof
36-
37-
spawn $argv node recommission 2
38-
eexpect "warning: node 2 is not decommissioned"
39-
eexpect eof
4037
end_test
4138

4239

pkg/cli/node.go

Lines changed: 58 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ import (
2727
"github.com/cockroachdb/cockroach/pkg/util/retry"
2828
"github.com/cockroachdb/errors"
2929
"github.com/spf13/cobra"
30+
"google.golang.org/grpc/codes"
31+
"google.golang.org/grpc/status"
3032
)
3133

3234
const (
@@ -64,9 +66,10 @@ func runLsNodes(cmd *cobra.Command, args []string) error {
6466
_, rows, err := runQuery(
6567
conn,
6668
makeQuery(`SELECT node_id FROM crdb_internal.gossip_liveness
67-
WHERE decommissioning = false OR split_part(expiration,',',1)::decimal > now()::decimal`),
69+
WHERE membership = 'active' OR split_part(expiration,',',1)::decimal > now()::decimal`),
6870
false,
6971
)
72+
7073
if err != nil {
7174
return err
7275
}
@@ -105,6 +108,7 @@ var statusNodesColumnHeadersForStats = []string{
105108
var statusNodesColumnHeadersForDecommission = []string{
106109
"gossiped_replicas",
107110
"is_decommissioning",
111+
"membership",
108112
"is_draining",
109113
}
110114

@@ -143,7 +147,7 @@ func runStatusNodeInner(showDecommissioned bool, args []string) ([]string, [][]s
143147

144148
maybeAddActiveNodesFilter := func(query string) string {
145149
if !showDecommissioned {
146-
query += " WHERE decommissioning = false OR split_part(expiration,',',1)::decimal > now()::decimal"
150+
query += " WHERE membership = 'active' OR split_part(expiration,',',1)::decimal > now()::decimal"
147151
}
148152
return query
149153
}
@@ -184,10 +188,12 @@ SELECT node_id AS id,
184188
FROM crdb_internal.kv_store_status
185189
GROUP BY node_id`
186190

191+
// TODO(irfansharif): Remove the `is_decommissioning` column in v20.2.
187192
const decommissionQuery = `
188193
SELECT node_id AS id,
189194
ranges AS gossiped_replicas,
190-
decommissioning AS is_decommissioning,
195+
membership != 'active' as is_decommissioning,
196+
membership AS membership,
191197
draining AS is_draining
192198
FROM crdb_internal.gossip_liveness LEFT JOIN crdb_internal.gossip_nodes USING (node_id)`
193199

@@ -274,6 +280,7 @@ var decommissionNodesColumnHeaders = []string{
274280
"is_live",
275281
"replicas",
276282
"is_decommissioning",
283+
"membership",
277284
"is_draining",
278285
}
279286

@@ -330,7 +337,6 @@ func runDecommissionNode(cmd *cobra.Command, args []string) error {
330337
}
331338

332339
c := serverpb.NewAdminClient(conn)
333-
334340
return runDecommissionNodeImpl(ctx, c, nodeCtx.nodeDecommissionWait, nodeIDs)
335341
}
336342

@@ -409,11 +415,17 @@ func runDecommissionNodeImpl(
409415
MaxBackoff: 20 * time.Second,
410416
}
411417

418+
// Marking a node as fully decommissioned is driven by a two-step process.
419+
// We start off by marking each node as 'decommissioning'. In doing so,
420+
// replicas are slowly moved off of these nodes. It's only after when we're
421+
// made aware that the replica counts have all hit zero, and that all nodes
422+
// have been successfully marked as 'decommissioning', that we then go and
423+
// mark each node as 'decommissioned'.
412424
prevResponse := serverpb.DecommissionStatusResponse{}
413425
for r := retry.StartWithCtx(ctx, opts); r.Next(); {
414426
req := &serverpb.DecommissionRequest{
415-
NodeIDs: nodeIDs,
416-
Decommissioning: true,
427+
NodeIDs: nodeIDs,
428+
TargetMembership: kvserverpb.MembershipStatus_DECOMMISSIONING,
417429
}
418430
resp, err := c.Decommission(ctx, req)
419431
if err != nil {
@@ -430,18 +442,43 @@ func runDecommissionNodeImpl(
430442
} else {
431443
fmt.Fprintf(stderr, ".")
432444
}
445+
446+
anyActive := false
433447
var replicaCount int64
434-
allDecommissioning := true
435448
for _, status := range resp.Status {
449+
anyActive = anyActive || status.Membership.Active()
436450
replicaCount += status.ReplicaCount
437-
allDecommissioning = allDecommissioning && status.Decommissioning
438451
}
439-
if replicaCount == 0 && allDecommissioning {
452+
453+
if !anyActive && replicaCount == 0 {
454+
// We now mark the nodes as fully decommissioned.
455+
req := &serverpb.DecommissionRequest{
456+
NodeIDs: nodeIDs,
457+
TargetMembership: kvserverpb.MembershipStatus_DECOMMISSIONED,
458+
}
459+
resp, err := c.Decommission(ctx, req)
460+
if err != nil {
461+
fmt.Fprintln(stderr)
462+
return errors.Wrap(err, "while trying to mark as decommissioned")
463+
}
464+
if !reflect.DeepEqual(&prevResponse, resp) {
465+
fmt.Fprintln(stderr)
466+
if err := printDecommissionStatus(*resp); err != nil {
467+
return err
468+
}
469+
prevResponse = *resp
470+
}
471+
440472
fmt.Fprintln(os.Stdout, "\nNo more data reported on target nodes. "+
441473
"Please verify cluster health before removing the nodes.")
442474
return nil
443475
}
476+
444477
if wait == nodeDecommissionWaitNone {
478+
// The intent behind --wait=none is for it to be used when polling
479+
// manually from an external system. We'll only mark nodes as
480+
// fully decommissioned once the replica count hits zero and they're
481+
// all marked as decommissioning.
445482
return nil
446483
}
447484
if replicaCount < minReplicaCount {
@@ -453,7 +490,7 @@ func runDecommissionNodeImpl(
453490
}
454491

455492
func decommissionResponseAlignment() string {
456-
return "rcrcc"
493+
return "rcrccc"
457494
}
458495

459496
// decommissionResponseValueToRows converts DecommissionStatusResponse_Status to
@@ -468,7 +505,8 @@ func decommissionResponseValueToRows(
468505
strconv.FormatInt(int64(node.NodeID), 10),
469506
strconv.FormatBool(node.IsLive),
470507
strconv.FormatInt(node.ReplicaCount, 10),
471-
strconv.FormatBool(node.Decommissioning),
508+
strconv.FormatBool(!node.Membership.Active()),
509+
node.Membership.String(),
472510
strconv.FormatBool(node.Draining),
473511
})
474512
}
@@ -522,13 +560,19 @@ func runRecommissionNode(cmd *cobra.Command, args []string) error {
522560
}
523561

524562
c := serverpb.NewAdminClient(conn)
525-
526563
req := &serverpb.DecommissionRequest{
527-
NodeIDs: nodeIDs,
528-
Decommissioning: false,
564+
NodeIDs: nodeIDs,
565+
TargetMembership: kvserverpb.MembershipStatus_ACTIVE,
529566
}
530567
resp, err := c.Decommission(ctx, req)
531568
if err != nil {
569+
// If it's a specific illegal membership transition error, we try to
570+
// surface a more readable message to the user. See
571+
// ValidateLivenessTransition in kvserverpb/liveness.go for where this
572+
// error is generated.
573+
if s, ok := status.FromError(err); ok && s.Code() == codes.FailedPrecondition {
574+
return errors.Newf("%s", s.Message())
575+
}
532576
return err
533577
}
534578
return printDecommissionStatus(*resp)

0 commit comments

Comments
 (0)