Skip to content

Commit afae11e

Browse files
committed
checkpointing in k3s
1 parent 15e78b1 commit afae11e

File tree

1 file changed

+16
-5
lines changed

1 file changed

+16
-5
lines changed

content/posts/checkpointing.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,26 @@ authors: ["Kapil Agrawal"]
66
comments: false
77
---
88

9+
A checkpoint involves taking a snapshot of a running process or a set of processes and save their entire state to disk as a collection of files, known as image files. This state includes memory contents, open file descriptors, network connections, CPU registers, and other process-related information.Kubernetes v1.25 introduced the concept of creating stateful container checkpoints for forensic analysis without stopping a pod. In this blog post I am going to cover the steps involved with checkpointing a pod running on k3s cluster.
10+
911
## Identify our Pod of interest
1012

11-
Find the node where the pod is currently running
13+
Let's say we want to checkpoint our `netshoot` pod. First we need to locate the node where the pod is currently running
1214

1315
```sh
1416
kubectl get pod -o wide
17+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
18+
hello-world-5566b798fc-47gtp 1/1 Running 1 (41h ago) 3d3h fd00::1d1 x86-dev <none> <none>
19+
netshoot 1/1 Running 1 (39h ago) 39h fd00::1ea x86-dev <none> <none>
20+
1521
```
1622

1723
Locate the container id of the Pod
1824

1925
```sh
20-
kubectl desribe pod PODNAME | grep -i "Container ID"
26+
❯ kubectl describe pod netshoot | grep -i "Container ID"
27+
Container ID: containerd://c371c62bf021a0cf05f0382b101f3694a46eaf373621c2cf94990a0b0926a133
28+
2129
```
2230

2331
## Requirements
@@ -69,6 +77,8 @@ lrwxrwxrwx 1 root root 7 Sep 6 19:27 tar -> busybox*
6977

7078
## Checkpoint a running pod on a K3s node
7179

80+
Notice the filename and path of the cert, key and cacert. This will likely be different for other kubernetes distributions. When in doubt, RTFM :-)
81+
7282
```sh
7383
curl -q -s --insecure \
7484
--cert /var/lib/rancher/k3s/agent/client-kubelet.crt \
@@ -79,7 +89,7 @@ curl -q -s --insecure \
7989

8090
## Example
8191

82-
Try checkpointing a running netshoot pod
92+
Let's try checkpointing our netshoot pod
8393

8494
```sh
8595
[root@x86-dev:~] curl -q -s --insecure \
@@ -111,7 +121,7 @@ default netshoot netshoot containerd 13 Sep 25 18:09 CDT checkpoint
111121
112122
```
113123

114-
Inspecting a checkpoint image
124+
### Inspecting a checkpoint image
115125

116126
```sh
117127
[root@x86-dev:~] checkpointctl inspect \
@@ -125,7 +135,7 @@ Inspecting a checkpoint image
125135
checkpoint-netshoot_default-netshoot-2025-09-13T18:09:15-05:00.tar
126136
```
127137

128-
Show memory dump
138+
### Show memory dump
129139

130140
```sh
131141
# kubelet stores checkpoint under /var/lib/kubelet/checkpoints/
@@ -140,5 +150,6 @@ Show memory dump
140150
--
141151

142152
- https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/
153+
- https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
143154
- https://criu.org/Containerd
144155
- https://github.com/checkpoint-restore

0 commit comments

Comments
 (0)