Skip to content

Commit 1c41f9b

Browse files
authored
Merge pull request moby#5765 from AkihiroSuda/rootless
rootless: update docs and examples
2 parents f7999fe + 3a91b50 commit 1c41f9b

11 files changed

+317
-74
lines changed

docs/rootless.md

+72-39
Original file line numberDiff line numberDiff line change
@@ -12,34 +12,48 @@ Rootless mode allows running BuildKit daemon as a non-root user.
1212

1313
[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.
1414

15-
```console
16-
$ rootlesskit buildkitd
15+
```bash
16+
rootlesskit buildkitd
1717
```
1818

19-
```console
20-
$ buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
19+
```bash
20+
buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
2121
```
2222

23-
To isolate BuildKit daemon's network namespace from the host (recommended):
24-
```console
25-
$ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
26-
```
23+
> [!TIP]
24+
> To isolate BuildKit daemon's network namespace from the host (recommended):
25+
> ```bash
26+
> rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
27+
> ```
2728
2829
## Running BuildKit in Rootless mode (containerd worker)
2930
3031
[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.
3132
3233
Run containerd in rootless mode using rootlesskit following [containerd's document](https://github.com/containerd/containerd/blob/main/docs/rootless.md).
3334
35+
```bash
36+
containerd-rootless.sh
37+
38+
CONTAINERD_NAMESPACE=default containerd-rootless-setuptool.sh install-buildkit-containerd
3439
```
35-
$ containerd-rootless.sh
36-
```
3740
38-
Then let buildkitd join the same namespace as containerd.
41+
<details>
42+
<summary>Advanced guide</summary>
43+
44+
<p>
45+
3946
47+
Alternatively, you can specify the full command line flags as follows:
48+
```bash
49+
containerd-rootless.sh --config /path/to/config.toml
50+
51+
containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true
4052
```
41-
$ containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true --containerd-worker-snapshotter=native
42-
```
53+
54+
</p>
55+
56+
</details>
4357
4458
## Containerized deployment
4559
@@ -48,36 +62,45 @@ See [`../examples/kubernetes`](../examples/kubernetes).
4862
4963
### Docker
5064
51-
```console
52-
$ docker run \
65+
```bash
66+
docker run \
5367
--name buildkitd \
5468
-d \
5569
--security-opt seccomp=unconfined \
5670
--security-opt apparmor=unconfined \
57-
--device /dev/fuse \
58-
moby/buildkit:rootless --oci-worker-no-process-sandbox
59-
$ buildctl --addr docker-container://buildkitd build ...
60-
```
71+
--security-opt systempaths=unconfined \
72+
moby/buildkit:rootless
6173
62-
If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
63-
64-
```console
65-
$ docker run --name buildkitd -d --privileged moby/buildkit:rootless
74+
buildctl --addr docker-container://buildkitd build ...
6675
```
6776
68-
#### About `--device /dev/fuse`
69-
Adding `--device /dev/fuse` to the `docker run` arguments is required only if you want to use `fuse-overlayfs` snapshotter.
77+
> [!TIP]
78+
> If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
79+
>
80+
> ```bash
81+
> docker run --name buildkitd -d --privileged moby/buildkit:rootless
82+
> ```
7083
71-
#### About `--oci-worker-no-process-sandbox`
84+
Justification of the `--security-opt` flags:
7285
73-
By adding `--oci-worker-no-process-sandbox` to the `buildkitd` arguments, BuildKit can be executed in a container without adding `--privileged` to `docker run` arguments.
74-
However, you still need to pass `--security-opt seccomp=unconfined --security-opt apparmor=unconfined` to `docker run`.
86+
* `seccomp=unconfined`: For allowing several syscalls such as `unshare` (used by runc) and `mount` (used by snapshotters, etc).
7587
76-
Note that `--oci-worker-no-process-sandbox` allows build executor containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
88+
* `apparmor=unconfined`: For allowing mounting filesystems, etc.
89+
This flag is not needed when the host operating system does not use AppArmor.
7790
78-
To allow running rootless `buildkitd` without `--oci-worker-no-process-sandbox`, run `docker run` with `--security-opt systempaths=unconfined`. (For Kubernetes, set `securityContext.procMount` to `Unmasked`.)
91+
* `systempaths=unconfined`: For disabling the masks for the `/proc` mount in the container, so that each of `ExecOp`
92+
(corresponds to a `RUN` instruction in Dockerfile) can have a dedicated `/proc` filesystem.
93+
`systempaths=unconfined` potentially allows reading and writing dangerous kernel files from a container, but it is safe when you are running `buildkitd` as non-root.
7994
80-
The `--security-opt systempaths=unconfined` flag disables the masks for the `/proc` mount in the container and potentially allows reading and writing dangerous kernel files, but it is safe when you are running `buildkitd` as non-root.
95+
> [!TIP]
96+
> Instead of `--security-opt systempaths=unconfined`, `buildkitd` can be also executed with `--oci-worker-no-process-sandbox` (flag of `buildkitd`, not `docker`)
97+
> to avoid creating a new PID namespace and mounting a new `/proc` for it.
98+
>
99+
> Using `--oci-worker-no-process-sandbox` is discouraged, as it cannot terminate processes that did not exit during an `ExecOp`.
100+
> Also, `--oci-worker-no-process-sandbox` allows `ExecOp` containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
101+
>
102+
> Despite these caveats, the [Kubernetes examples](../examples/kubernetes) uses `--oci-worker-no-process-sandbox`, as Kubernetes lacks the equivalent of `systempaths=unconfined`.
103+
> (`securityContext.procMount=Unmasked` is similar, but different in the sense that it depends on `hostUsers: false`)
81104
82105
### Change UID/GID
83106
@@ -90,7 +113,7 @@ Actual ID (shown in the host and the BuildKit daemon container)| Mapped ID (show
90113
... | ...
91114
165535 | 65536
92115
93-
```
116+
```console
94117
$ docker exec buildkitd id
95118
uid=1000(user) gid=1000(user)
96119
$ docker exec buildkitd ps aux
@@ -99,15 +122,16 @@ PID USER TIME COMMAND
99122
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
100123
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
101124
29 user 0:00 ps aux
125+
102126
$ docker exec cat /etc/subuid
103127
user:100000:65536
104128
```
105129
106130
To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
107-
```
108-
$ vi Dockerfile
109-
$ make images
110-
$ docker run ... moby/buildkit:local-rootless ...
131+
```bash
132+
vi Dockerfile
133+
make images
134+
docker run ... moby/buildkit:local-rootless ...
111135
```
112136
113137
## Troubleshooting
@@ -120,7 +144,9 @@ $ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfs
120144
```
121145
122146
### Error related to `fuse-overlayfs`
123-
Try running `buildkitd` with `--oci-worker-snapshotter=native`:
147+
Run `docker run` with `--device /dev/fuse`.
148+
149+
Also try running `buildkitd` with `--oci-worker-snapshotter=native`:
124150
125151
```console
126152
$ rootlesskit buildkitd --oci-worker-snapshotter=native
@@ -137,12 +163,19 @@ Run `sysctl -w user.max_user_namespaces=N` (N=positive integer, like 63359) on t
137163
138164
See [`../examples/kubernetes/sysctl-userns.privileged.yaml`](../examples/kubernetes/sysctl-userns.privileged.yaml).
139165
166+
### Error `fork/exec /proc/self/exe: permission denied` with `This error might have happened because /proc/sys/kernel/apparmor_restrict_unprivileged_userns is set to 1`
167+
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
168+
140169
### Error `mount proc:/proc (via /proc/self/fd/6), flags: 0xe: operation not permitted`
141-
This error is known to happen when BuildKit is executed in a container without the `--oci-worker-no-sandbox` flag.
142-
Make sure that `--oci-worker-no-process-sandbox` is specified (See [below](#docker)).
170+
This error is known to happen when BuildKit is executed in a container without the `--security-opt systempaths=unconfined` flag.
171+
Make sure to specify it (See [above](#docker)).
143172
144173
## Distribution-specific hint
145174
Using Ubuntu kernel is recommended.
175+
176+
### Ubuntu, 24.04 or later
177+
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
178+
146179
### Container-Optimized OS from Google
147180
Make sure to have an `emptyDir` volume below:
148181
```yaml

examples/kubernetes/README.md

+34-22
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,26 @@ This directory contains Kubernetes manifests for `Pod`, `Deployment` (with `Serv
66
* `StateFulset`: good for client-side load balancing, without registry-side cache
77
* `Job`: good if you don't want to have daemon pods
88

9-
Using Rootless mode (`*.rootless.yaml`) is recommended because Rootless mode image is executed as non-root user (UID 1000) and doesn't need `securityContext.privileged`.
10-
See [`../../docs/rootless.md`](../../docs/rootless.md).
9+
## Variants
1110

12-
See also ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
11+
- `*.privileged.yaml`: Launches the Pod as the fully privileged root user.
12+
- `*.rootless.yaml`: Launches the Pod as a non-root user, whose UID is 1000.
13+
- `*.userns.yaml`: Launches the Pod as a non-root user. The UID is determined by kubelet.
14+
Needs kubelet and kube-apiserver to be reconfigured to enable the
15+
[`UserNamespacesSupport`](https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/) feature gate.
16+
17+
It is recommended to use `*.rootless.yaml` to minimize the chance of container breakout attacks.
18+
19+
See also:
20+
- [`../../docs/rootless.md`](../../docs/rootless.md).
21+
- ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
1322

1423
## `Pod`
1524

16-
```console
17-
$ kubectl apply -f pod.rootless.yaml
18-
$ buildctl \
25+
```bash
26+
kubectl apply -f pod.rootless.yaml
27+
28+
buildctl \
1929
--addr kube-pod://buildkitd \
2030
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
2131
```
@@ -29,25 +39,27 @@ If rootless mode doesn't work, try `pod.privileged.yaml`.
2939
Setting up mTLS is highly recommended.
3040

3141
`./create-certs.sh SAN [SAN...]` can be used for creating certificates.
32-
```console
33-
$ ./create-certs.sh 127.0.0.1
42+
```bash
43+
./create-certs.sh 127.0.0.1
3444
```
3545

3646
The daemon certificates is created as `Secret` manifest named `buildkit-daemon-certs`.
37-
```console
38-
$ kubectl apply -f .certs/buildkit-daemon-certs.yaml
47+
```bash
48+
kubectl apply -f .certs/buildkit-daemon-certs.yaml
3949
```
4050

4151
Apply the `Deployment` and `Service` manifest:
42-
```console
43-
$ kubectl apply -f deployment+service.rootless.yaml
44-
$ kubectl scale --replicas=10 deployment/buildkitd
52+
```bash
53+
kubectl apply -f deployment+service.rootless.yaml
54+
55+
kubectl scale --replicas=10 deployment/buildkitd
4556
```
4657

4758
Run `buildctl` with TLS client certificates:
48-
```console
49-
$ kubectl port-forward service/buildkitd 1234
50-
$ buildctl \
59+
```bash
60+
kubectl port-forward service/buildkitd 1234
61+
62+
buildctl \
5163
--addr tcp://127.0.0.1:1234 \
5264
--tlscacert .certs/client/ca.pem \
5365
--tlscert .certs/client/cert.pem \
@@ -58,10 +70,10 @@ $ buildctl \
5870
## `StatefulSet`
5971
`StatefulSet` is useful for consistent hash mode.
6072

61-
```console
62-
$ kubectl apply -f statefulset.rootless.yaml
63-
$ kubectl scale --replicas=10 statefulset/buildkitd
64-
$ buildctl \
73+
```bash
74+
kubectl apply -f statefulset.rootless.yaml
75+
kubectl scale --replicas=10 statefulset/buildkitd
76+
buildctl \
6577
--addr kube-pod://buildkitd-4 \
6678
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
6779
```
@@ -70,8 +82,8 @@ See [`./consistenthash`](./consistenthash) for how to use consistent hashing.
7082

7183
## `Job`
7284

73-
```console
74-
$ kubectl apply -f job.rootless.yaml
85+
```bash
86+
kubectl apply -f job.rootless.yaml
7587
```
7688

7789
To push the image to the registry, you also need to mount `~/.docker/config.json`

examples/kubernetes/deployment+service.rootless.yaml

+3-2
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ spec:
1313
metadata:
1414
labels:
1515
app: buildkitd
16-
annotations:
17-
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
1816
# see buildkit/docs/rootless.md for caveats of rootless mode
1917
spec:
2018
containers:
@@ -54,6 +52,9 @@ spec:
5452
# Needs Kubernetes >= 1.19
5553
seccompProfile:
5654
type: Unconfined
55+
# Needs Kubernetes >= 1.30
56+
appArmorProfile:
57+
type: Unconfined
5758
# To change UID/GID, you need to rebuild the image
5859
runAsUser: 1000
5960
runAsGroup: 1000
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Depends on feature gate UserNamespacesSupport
2+
apiVersion: apps/v1
3+
kind: Deployment
4+
metadata:
5+
labels:
6+
app: buildkitd
7+
name: buildkitd
8+
spec:
9+
replicas: 1
10+
selector:
11+
matchLabels:
12+
app: buildkitd
13+
template:
14+
metadata:
15+
labels:
16+
app: buildkitd
17+
spec:
18+
hostUsers: false
19+
containers:
20+
- name: buildkitd
21+
image: moby/buildkit:master
22+
args:
23+
- --addr
24+
- unix:///run/buildkit/buildkitd.sock
25+
- --addr
26+
- tcp://0.0.0.0:1234
27+
- --tlscacert
28+
- /certs/ca.pem
29+
- --tlscert
30+
- /certs/cert.pem
31+
- --tlskey
32+
- /certs/key.pem
33+
# the probe below will only work after Release v0.6.3
34+
readinessProbe:
35+
exec:
36+
command:
37+
- buildctl
38+
- debug
39+
- workers
40+
initialDelaySeconds: 5
41+
periodSeconds: 30
42+
# the probe below will only work after Release v0.6.3
43+
livenessProbe:
44+
exec:
45+
command:
46+
- buildctl
47+
- debug
48+
- workers
49+
initialDelaySeconds: 5
50+
periodSeconds: 30
51+
securityContext:
52+
# Not really privileged
53+
privileged: true
54+
ports:
55+
- containerPort: 1234
56+
volumeMounts:
57+
- name: certs
58+
readOnly: true
59+
mountPath: /certs
60+
volumes:
61+
# buildkit-daemon-certs must contain ca.pem, cert.pem, and key.pem
62+
- name: certs
63+
secret:
64+
secretName: buildkit-daemon-certs
65+
---
66+
apiVersion: v1
67+
kind: Service
68+
metadata:
69+
labels:
70+
app: buildkitd
71+
name: buildkitd
72+
spec:
73+
ports:
74+
- port: 1234
75+
protocol: TCP
76+
selector:
77+
app: buildkitd

examples/kubernetes/job.privileged.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ spec:
88
restartPolicy: Never
99
initContainers:
1010
- name: prepare
11-
image: alpine:3.10
11+
image: busybox
1212
command:
1313
- sh
1414
- -c
15-
- "echo FROM hello-world > /workspace/Dockerfile"
15+
- "echo -e 'FROM alpine\nRUN apk add gcc\n' > /workspace/Dockerfile"
1616
volumeMounts:
1717
- name: workspace
1818
mountPath: /workspace

0 commit comments

Comments
 (0)