Skip to content

Commit 8c7ec0d

Browse files
committed
Add rollback process to migration guides)
1 parent 656a6de commit 8c7ec0d

File tree

2 files changed

+177
-1
lines changed

2 files changed

+177
-1
lines changed

src/current/v25.2/migrate-cockroachdb-kubernetes-helm.md

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,10 @@ For each pod in the StatefulSet, perform the following steps:
130130
131131
Repeat these steps until the StatefulSet has zero replicas.
132132
133+
{{site.data.alerts.callout_danger}}
134+
If there are issues with the migration and you need to revert back to the previous deployment, follow the [rollback process](#roll-back-a-migration-in-progress).
135+
{{site.data.alerts.end}}
136+
133137
## Step 4. Update the public service
134138
135139
The Helm chart creates a public Service that exposes both SQL and gRPC connections over a single power. However, the operator uses a different port for gRPC communication. To ensure compatibility, update the public Service to reflect the correct gRPC port used by the operator.
@@ -162,4 +166,77 @@ Apply the crdbcluster manifest using Helm:
162166
{% include_cached copy-clipboard.html %}
163167
~~~ shell
164168
helm upgrade $RELEASE_NAME ./cockroachdb-parent/charts/cockroachdb -f manifests/values.yaml
165-
~~~
169+
~~~
170+
171+
## Roll back a migration in progress
172+
173+
If the migration to the {{ site.data.products.cockroachdb-operator}} fails during the stage where you are applying the generated `crdbnode` manifests, follow the steps below to safely restore the original state using the previously backed-up resources and preserved volumes. This assumes the StatefulSet and PVCs are not deleted.
174+
175+
1. Delete the applied `crdbnode` resources and simultaneously scale the StatefulSet back up.
176+
177+
Delete the individual `crdbnode` manifests in the reverse order of their creation (starting with the last one created, e.g., `crdbnode-2.yaml`) and scale the StatefulSet back to its original replica count (e.g., 3). For example, assuming you have applied two `crdbnode` yaml files (`crdbnode-2.yaml` & `crdbnode-1.yaml`):
178+
179+
1. Delete a `crdbnode` manifest in reverse order, starting with `crdbnode-2.yaml`.
180+
1. Scale the StatefulSet replica count up by one (to 2).
181+
1. Verify that data has propagated by waiting for there to be zero under-replicated ranges:
182+
183+
1. Set up port forwarding to access the CockroachDB node's HTTP interface, replacing `cockroachdb-X` with the node name:
184+
185+
{% include_cached copy-clipboard.html %}
186+
~~~ shell
187+
kubectl port-forward pod/cockroachdb-X 8080:8080
188+
~~~
189+
190+
The DB Console runs on port 8080 by default.
191+
192+
1. Check the `ranges_underreplicated` metric:
193+
194+
{% include_cached copy-clipboard.html %}
195+
~~~ shell
196+
curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk ' {print $2}'
197+
~~~
198+
199+
This command outputs the number of under-replicated ranges on the node, which should be zero before proceeding with the next node. This may take some time depending on the deployment, but is necessary to ensure that there is no downtime in data availability.
200+
201+
1. Repeat steps a through c for each node, deleting the `crdbnode-1.yaml`, scaling replica count to 3, and so on.
202+
203+
{% include_cached copy-clipboard.html %}
204+
~~~ shell
205+
kubectl delete -f manifests/crdbnode-2.yaml
206+
kubectl scale statefulset $CRDBCLUSTER --replicas=2
207+
~~~
208+
209+
Repeat the `kubectl delete -f ... command` for each `crdbnode` manifest you applied during migration. Make sure to verify that there are no underreplicated ranges after rolling back each node.
210+
211+
1. Delete the PriorityClass and RBAC resources created for the CockroachDB operator:
212+
213+
{% include_cached copy-clipboard.html %}
214+
~~~ shell
215+
kubectl delete priorityclass crdb-critical
216+
kubectl delete -f manifests/rbac.yaml
217+
~~~
218+
219+
1. Uninstall the {{ site.data.products.cockroachdb-operator }}:
220+
221+
{% include_cached copy-clipboard.html %}
222+
~~~ shell
223+
helm uninstall crdb-operator
224+
~~~
225+
226+
1. Clean up {{ site.data.products.cockroachdb-operator }} resources and custom resource definitions:
227+
228+
{% include_cached copy-clipboard.html %}
229+
~~~ shell
230+
kubectl delete crds crdbnodes.crdb.cockroachlabs.com
231+
kubectl delete crds crdbtenants.crdb.cockroachlabs.com
232+
kubectl delete serviceaccount cockroachdb-sa
233+
kubectl delete service cockroach-webhook-service
234+
kubectl delete validatingwebhookconfiguration cockroach-webhook-config
235+
~~~
236+
237+
1. Confirm that all CockroachDB pods are "Running" or "Ready" as shown with the following command:
238+
239+
{% include_cached copy-clipboard.html %}
240+
~~~ shell
241+
kubectl get pods
242+
~~~

src/current/v25.2/migrate-cockroachdb-kubernetes-operator.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,10 @@ For each pod in the StatefulSet, perform the following steps:
180180

181181
Repeat these steps until the StatefulSet has zero replicas.
182182

183+
{{site.data.alerts.callout_danger}}
184+
If there are issues with the migration and you need to revert back to the previous deployment, follow the [rollback process](#roll-back-a-migration-in-progress).
185+
{{site.data.alerts.end}}
186+
183187
## Step 5. Update the crdbcluster manifest
184188

185189
The {{ site.data.products.public-operator }} creates a pod disruption budget that conflicts with a pod disruption budget managed by the {{ site.data.products.cockroachdb-operator }}. Before applying the crdbcluster manifest, delete the existing pod disruption budget:
@@ -211,3 +215,98 @@ Once the migration is successful, delete the StatefulSet that was created by the
211215
~~~ shell
212216
kubectl delete poddisruptionbudget $STS_NAME-budget
213217
~~~
218+
219+
## Roll back a migration in progress
220+
221+
If the migration to the {{ site.data.products.cockroachdb-operator}} fails during the stage where you are applying the generated `crdbnode` manifests, follow the steps below to safely restore the original state using the previously backed-up resources and preserved volumes. This assumes the StatefulSet and PVCs are not deleted.
222+
223+
1. Delete the applied `crdbnode` resources and simultaneously scale the StatefulSet back up.
224+
225+
Delete the individual `crdbnode` manifests in the reverse order of their creation (starting with the last one created, e.g., `crdbnode-2.yaml`) and scale the StatefulSet back to its original replica count (e.g., 3). For example, assuming you have applied two `crdbnode` yaml files (`crdbnode-2.yaml` & `crdbnode-1.yaml`):
226+
227+
1. Delete a `crdbnode` manifest in reverse order, starting with `crdbnode-2.yaml`.
228+
1. Scale the StatefulSet replica count up by one (to 2).
229+
1. Verify that data has propagated by waiting for there to be zero under-replicated ranges:
230+
231+
1. Set up port forwarding to access the CockroachDB node's HTTP interface, replacing `cockroachdb-X` with the node name:
232+
233+
{% include_cached copy-clipboard.html %}
234+
~~~ shell
235+
kubectl port-forward pod/cockroachdb-X 8080:8080
236+
~~~
237+
238+
The DB Console runs on port 8080 by default.
239+
240+
1. Check the `ranges_underreplicated` metric:
241+
242+
{% include_cached copy-clipboard.html %}
243+
~~~ shell
244+
curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk ' {print $2}'
245+
~~~
246+
247+
This command outputs the number of under-replicated ranges on the node, which should be zero before proceeding with the next node. This may take some time depending on the deployment, but is necessary to ensure that there is no downtime in data availability.
248+
249+
1. Repeat steps a through c for each node, deleting the `crdbnode-1.yaml`, scaling replica count to 3, and so on.
250+
251+
{% include_cached copy-clipboard.html %}
252+
~~~ shell
253+
kubectl delete -f manifests/crdbnode-2.yaml
254+
kubectl scale statefulset $CRDBCLUSTER --replicas=2
255+
~~~
256+
257+
Repeat the `kubectl delete -f ... command` for each `crdbnode` manifest you applied during migration. Make sure to verify that there are no underreplicated ranges after rolling back each node.
258+
259+
1. Delete the PriorityClass and RBAC resources created for the CockroachDB operator:
260+
261+
{% include_cached copy-clipboard.html %}
262+
~~~ shell
263+
kubectl delete priorityclass crdb-critical
264+
kubectl delete -f manifests/rbac.yaml
265+
~~~
266+
267+
1. Uninstall the {{ site.data.products.cockroachdb-operator }}:
268+
269+
{% include_cached copy-clipboard.html %}
270+
~~~ shell
271+
helm uninstall crdb-operator
272+
~~~
273+
274+
1. Clean up {{ site.data.products.cockroachdb-operator }} resources and custom resource definitions:
275+
276+
{% include_cached copy-clipboard.html %}
277+
~~~ shell
278+
kubectl delete crds crdbnodes.crdb.cockroachlabs.com
279+
kubectl delete crds crdbtenants.crdb.cockroachlabs.com
280+
kubectl delete serviceaccount cockroachdb-sa
281+
kubectl delete service cockroach-webhook-service
282+
kubectl delete validatingwebhookconfiguration cockroach-webhook-config
283+
~~~
284+
285+
1. Restore the {{ site.data.products.public-operator }}:
286+
287+
{% include_cached copy-clipboard.html %}
288+
~~~ shell
289+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
290+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
291+
~~~
292+
293+
Wait for the operator pod to be "Running" as shown with the following command:
294+
295+
{% include_cached copy-clipboard.html %}
296+
~~~ shell
297+
kubectl get pods -n cockroach-operator-system
298+
~~~
299+
300+
1. Restore the original `crdbcluster` custom resource:
301+
302+
{% include_cached copy-clipboard.html %}
303+
~~~ shell
304+
kubectl apply -f backup/crdbcluster-$CRDBCLUSTER.yaml
305+
~~~
306+
307+
1. Confirm that all CockroachDB pods are "Running" or "Ready" as shown with the following command:
308+
309+
{% include_cached copy-clipboard.html %}
310+
~~~ shell
311+
kubectl get pods
312+
~~~

0 commit comments

Comments
 (0)