Skip to content

Commit 15e78b1

Browse files
committed
new post - checkpointing in K8s
1 parent 7d01a53 commit 15e78b1

File tree

2 files changed

+145
-1
lines changed

2 files changed

+145
-1
lines changed

content/posts/checkpointing.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: Forensic container checkpointing in Kubernetes
3+
date: 2025-09-14
4+
tags: ["security", "kubernetes"]
5+
authors: ["Kapil Agrawal"]
6+
comments: false
7+
---
8+
9+
## Identify our Pod of interest
10+
11+
Find the node where the pod is currently running
12+
13+
```sh
14+
kubectl get pod -o wide
15+
```
16+
17+
Locate the container id of the Pod
18+
19+
```sh
20+
kubectl desribe pod PODNAME | grep -i "Container ID"
21+
```
22+
23+
## Requirements
24+
25+
1. Download and Install CRIU on the node
26+
https://criu.org/Packages
27+
28+
2. You may need to explicitly allow access to checkpoint api on the node
29+
30+
```yaml
31+
# kubectl apply -f node-checkpoint-rbac.yaml
32+
---
33+
apiVersion: rbac.authorization.k8s.io/v1
34+
kind: ClusterRole
35+
metadata:
36+
name: node-checkpoint-access
37+
rules:
38+
- apiGroups: [""]
39+
resources: ["nodes/checkpoint"]
40+
verbs: ["create"]
41+
42+
---
43+
apiVersion: rbac.authorization.k8s.io/v1
44+
kind: ClusterRoleBinding
45+
metadata:
46+
name: node-checkpoint-access
47+
roleRef:
48+
apiGroup: rbac.authorization.k8s.io
49+
kind: ClusterRole
50+
name: node-checkpoint-access
51+
subjects:
52+
- kind: Group
53+
name: system:nodes
54+
apiGroup: rbac.authorization.k8s.io
55+
```
56+
57+
⚠️ During checkpointing, a .tar archive is created, which requires a functional tar binary. By default, k3s relies on the BusyBox implementation of tar, which is incompatible with CRIU. To ensure checkpointing works correctly, you may need to override this with the system’s full tar binary.
58+
59+
```sh
60+
# /var/lib/rancher/k3s/data/ is k3s’s runtime dependency store, containing unpacked, versioned
61+
# bundles of the k3s binary, containerd, and supporting tools
62+
63+
[root@x86-dev:~] ls -l /var/lib/rancher/k3s/data/current/bin/tar
64+
lrwxrwxrwx 1 root root 7 Sep 6 19:27 tar -> busybox*
65+
66+
[root@x86-dev:~] rm /var/lib/rancher/k3s/data/current/bin/tar
67+
[root@x86-dev:~] ln -s $(which tar) /var/lib/rancher/k3s/data/current/bin/tar
68+
```
69+
70+
## Checkpoint a running pod on a K3s node
71+
72+
```sh
73+
curl -q -s --insecure \
74+
--cert /var/lib/rancher/k3s/agent/client-kubelet.crt \
75+
--key /var/lib/rancher/k3s/agent/client-kubelet.key \
76+
--cacert /var/lib/rancher/k3s/agent/client-ca.crt \
77+
-X POST "https://$(hostname -i):10250/checkpoint/NAMESPACE/PODNAME/CONTAINERNAME"
78+
```
79+
80+
## Example
81+
82+
Try checkpointing a running netshoot pod
83+
84+
```sh
85+
[root@x86-dev:~] curl -q -s --insecure \
86+
--cert /var/lib/rancher/k3s/agent/client-kubelet.crt \
87+
--key /var/lib/rancher/k3s/agent/client-kubelet.key \
88+
--cacert /var/lib/rancher/k3s/agent/client-ca.crt \
89+
-X POST "https://$(hostname -i):10250/checkpoint/default/netshoot/netshoot"
90+
```
91+
92+
Output
93+
94+
```sh
95+
[root@x86-dev:~] {"items":["/var/lib/kubelet/checkpoints/checkpoint-netshoot_default-netshoot-2025-09-13T18:09:15-05:00.tar"]}
96+
97+
[root@x86-dev:~] ls /var/lib/kubelet/checkpoints/
98+
checkpoint-netshoot_default-netshoot-2025-09-13T18:09:15-05:00.tar
99+
```
100+
101+
## Restoring checkpoint image for analysis
102+
103+
Download [checkpointctl](https://github.com/checkpoint-restore/checkpointctl)
104+
105+
```
106+
[root@x86-dev:~] checkpointctl list
107+
Listing checkpoints in path: /var/lib/kubelet/checkpoints/
108+
NAMESPACE POD CONTAINER ENGINE TIME CHECKPOINTED CHECKPOINT NAME
109+
--------- --- --------- ------ ----------------- ---------------
110+
default netshoot netshoot containerd 13 Sep 25 18:09 CDT checkpoint-netshoot_default-netshoot-2025-09-13T18:09:15-05:00.tar
111+
112+
```
113+
114+
Inspecting a checkpoint image
115+
116+
```sh
117+
[root@x86-dev:~] checkpointctl inspect \
118+
--files \
119+
--metadata \
120+
--mounts \
121+
--ps-tree \
122+
--ps-tree-cmd \
123+
--ps-tree-env \
124+
--sockets \
125+
checkpoint-netshoot_default-netshoot-2025-09-13T18:09:15-05:00.tar
126+
```
127+
128+
Show memory dump
129+
130+
```sh
131+
# kubelet stores checkpoint under /var/lib/kubelet/checkpoints/
132+
[root@x86-dev:~] checkpointctl memparse <PATH-TO-CHECKPOINT-TAR>
133+
134+
# show full memory dump of a process
135+
[root@x86-dev:~] checkpointctl memparse --pid PID <PATH-TO-CHECKPOINT-TAR>
136+
```
137+
138+
### Reference
139+
140+
--
141+
142+
- https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/
143+
- https://criu.org/Containerd
144+
- https://github.com/checkpoint-restore

content/posts/cilium-nat64.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ docker run --name cilium-lb -itd \
5555
--privileged=true \
5656
--restart=always \
5757
--network=host \
58-
"quay.io/cilium/cilium:stable" cilium-agent --enable-ipv4=true --enable-ipv6=true --devices=eth0 --datapath-mode=lb-only --enable-k8s=false --bpf-lb-mode=snat --enable-nat46x64-gateway=true
58+
"quay.io/cilium/cilium:v.17.7" cilium-agent --enable-ipv4=true --enable-ipv6=true --devices=eth0 --datapath-mode=lb-only --enable-k8s=false --bpf-lb-mode=snat --enable-nat46x64-gateway=true
5959
```
6060

6161
To check the status of our standalone cilium install with NAT64 enabled

0 commit comments

Comments
 (0)