Node stuck in `DECOMMISSIONING` state when upscale interrupts an ongoing downscale

### Description

A CockroachDB node can become permanently stuck in the `DECOMMISSIONING` membership state if the user scales up the cluster (`cockroach_cr.nodes`) while a previous downscale operation is still in progress.

The operator logic for `ReconcileDecommssion` relies on the condition `currentReplicas > cr.nodes`. If a user upscales (making `cr.nodes > currentReplicas`) while a node is in the intermediate `DECOMMISSIONING` state, the operator exits the reconcile loop for decommissioning. Consequently, the node is neither fully decommissioned nor returned to `ACTIVE` status.

### Root Cause Analysis

The issue lies in the entry conditions for the `ReconcileDecommssion` action. The operator triggers this action only if:

1. The Cluster is initialized.
2. `stsStatus.replicas == stsStatus.currentReplicas`
3. `stsStatus.currentReplicas > cockroach_cr.nodes` (Intent to downscale).

**The Failure Scenario:**

1. User initiates downscale (e.g., 5 -> 3). Condition (3) is met.
2. Operator calls `Decommission(node)`. The node enters the `DECOMMISSIONING` state.
3. `Decommission(node)` returns an error (e.g., network timeout, data moving too slowly). The operator returns early to retry later.
4. User initiates upscale (e.g., 3 -> 5).
5. On the next reconcile, Condition (3) evaluates to `false` because `currentReplicas` is no longer greater than `cr.replicas`.
6. The operator skips the `ReconcileDecommssion` block entirely. The node remains `DECOMMISSIONING` indefinitely.

### Steps to Reproduce

**Prerequisites:**

* A running CockroachDB cluster managed by the operator (5 nodes).
* (Optional) **ChaosMesh** installed to inject network faults.

1. **Initialize Workload:**
Run the `movr` workload to generate sufficient data:
```bash
cockroach workload init movr --num-histories 1000000 --num-rides 100000 --num-users 100000 --num-vehicles 100000
```

2. **Inject Fault:**
Use ChaosMesh to limit the Pod bandwidth to 1kbps. This ensures the decommissioning process stalls or errors out due to slow data replication.
3. **Trigger Downscale:**
Update the `CockroachDB` CR to reduce the replica count (e.g., 5 -> 3).
4. **Verify State:**
Wait until the target node enters the `DECOMMISSIONING` state:
```bash
cockroach node status --insecure --decommission

```


5. **Trigger Upscale:**
Immediately update the `CockroachDB` CR to increase the replica count (e.g., back to 5 or higher).

### **Observed Behavior**

The node previously targeted for removal remains in the `DECOMMISSIONING` state while new nodes are added. It does not revert to `ACTIVE`.

**Log Output:**

```text
bash-5.1$ cockroach node status --insecure --decommission
  id |                          address                          |                        sql_address                        |  build  |              started_at              |              updated_at              | locality | attrs | is_available | is_live | gossiped_replicas | is_decommissioning |   membership    | is_draining
-----+-----------------------------------------------------------+-----------------------------------------------------------+---------+--------------------------------------+--------------------------------------+----------+-------+--------------+---------+-------------------+--------------------+-----------------+--------------
   1 | cockroachdb-0.cockroachdb.cockroach-operator-system:26258 | cockroachdb-0.cockroachdb.cockroach-operator-system:26257 | v25.4.2 | 2026-01-12 02:54:10.174444 +0000 UTC | 2026-01-12 03:07:17.954084 +0000 UTC |          | []    | true         | true    |               104 | false              | active          | false
   2 | cockroachdb-1.cockroachdb.cockroach-operator-system:26258 | cockroachdb-1.cockroachdb.cockroach-operator-system:26257 | v25.4.2 | 2026-01-12 02:55:32.643908 +0000 UTC | 2026-01-12 03:07:17.797383 +0000 UTC |          | []    | true         | true    |                98 | false              | active          | false
   3 | cockroachdb-2.cockroachdb.cockroach-operator-system:26258 | cockroachdb-2.cockroachdb.cockroach-operator-system:26257 | v25.4.2 | 2026-01-12 02:55:34.215002 +0000 UTC | 2026-01-12 03:07:16.41831 +0000 UTC  |          | []    | true         | true    |               100 | false              | active          | false
   4 | cockroachdb-4.cockroachdb.cockroach-operator-system:26258 | cockroachdb-4.cockroachdb.cockroach-operator-system:26257 | v25.4.2 | 2026-01-12 02:39:19.128573 +0000 UTC | 2026-01-12 03:07:16.193822 +0000 UTC |          | []    | true         | true    |                40 | true               | decommissioning | false
   5 | cockroachdb-3.cockroachdb.cockroach-operator-system:26258 | cockroachdb-3.cockroachdb.cockroach-operator-system:26257 | v25.4.2 | 2026-01-12 02:56:23.412081 +0000 UTC | 2026-01-12 03:07:18.188583 +0000 UTC |          | []    | true         | true    |               104 | false              | active          | false
(5 rows)
```

### Expected Behavior

If the operator detects that the desired replica count has increased (upscale) while a node is currently `DECOMMISSIONING`:

1. The operator should detect the intermediate state.
2. It should explicitly recommission the node (cancel the decommission) to return it to `ACTIVE` status before proceeding with the upscale.

### Severity
Major
(But this is not a production failure)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node stuck in `DECOMMISSIONING` state when upscale interrupts an ongoing downscale #1138

Description

Root Cause Analysis

Steps to Reproduce

Observed Behavior

Expected Behavior

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node stuck in DECOMMISSIONING state when upscale interrupts an ongoing downscale #1138

Description

Description

Root Cause Analysis

Steps to Reproduce

Observed Behavior

Expected Behavior

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Node stuck in `DECOMMISSIONING` state when upscale interrupts an ongoing downscale #1138