-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
TL;DR checkProcMount() won't let me mount /proc/sys/net as read-write.
I'm trying to run a libvirt KVM VM inside of a docker container without using --privileged. I've worked around a lot of other errors by:
- Adding
/dev/kvmand/dev/net/tundevices - Granting
CAP_NET_ADMIN(safe: net-namespaced) - Mounting
/sys/fs/cgroup/*read-write (safe?) - Mounting
/sys/devices/virtual/netread-write (safe: net-namespaced)
But there's one error I can't work around:
libvirt.libvirtError: cannot write to /proc/sys/net/ipv6/conf/virbr2/disable_ipv6 to enable/disable IPv6 on bridge virbr2: Read-only file system
What I would like to do is allow /proc/sys/net to be mounted read-write inside of the container. My understanding is that this is safe because everything in that subdirectory is net-namespaced, so a container can't affect the host net ns. (I would have to audit some kernel code to be sure, but it's certainly better than --privileged).
The problem is that checkProcMount() won't let me:
$ docker version
Client: Docker Engine - Community
Version: 20.10.3
API version: 1.41
Go version: go1.13.15
Git commit: 48d30b5
Built: Fri Jan 29 14:33:25 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.3
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 46229ca
Built: Fri Jan 29 14:31:38 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker run --rm -it -v "/proc/sys/net:/proc/sys/net:rw" debian:10
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:59: mounting "/proc/sys/net" to rootfs at "/proc/sys/net" caused: "/var/lib/docker/overlay2/9fd477a20091dd5d9babf5ce2bddb8d517349c89ba9a4e6c7f74f275f0c370c9/merged/proc/sys/net" cannot be mounted because it is inside /proc: unknown.
My only alternative to --privileged is granting CAP_SYS_ADMIN (for mount(2)) and remounting /proc/sys inside the container. This is a horrible alternative because:
CAP_SYS_ADMINis terribly overloaded/proc/syshas lots of kernel global options which aren't namespaced