Skip to content

checkProcMount() is too strict #2826

@JonathonReinhart

Description

@JonathonReinhart

TL;DR checkProcMount() won't let me mount /proc/sys/net as read-write.

I'm trying to run a libvirt KVM VM inside of a docker container without using --privileged. I've worked around a lot of other errors by:

  • Adding /dev/kvm and /dev/net/tun devices
  • Granting CAP_NET_ADMIN (safe: net-namespaced)
  • Mounting /sys/fs/cgroup/* read-write (safe?)
  • Mounting /sys/devices/virtual/net read-write (safe: net-namespaced)

But there's one error I can't work around:

libvirt.libvirtError: cannot write to /proc/sys/net/ipv6/conf/virbr2/disable_ipv6 to enable/disable IPv6 on bridge virbr2: Read-only file system

What I would like to do is allow /proc/sys/net to be mounted read-write inside of the container. My understanding is that this is safe because everything in that subdirectory is net-namespaced, so a container can't affect the host net ns. (I would have to audit some kernel code to be sure, but it's certainly better than --privileged).

The problem is that checkProcMount() won't let me:

$ docker version
Client: Docker Engine - Community
 Version:           20.10.3
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        48d30b5
 Built:             Fri Jan 29 14:33:25 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.3
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       46229ca
  Built:            Fri Jan 29 14:31:38 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0


$ docker run --rm -it -v "/proc/sys/net:/proc/sys/net:rw" debian:10
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:59: mounting "/proc/sys/net" to rootfs at "/proc/sys/net" caused: "/var/lib/docker/overlay2/9fd477a20091dd5d9babf5ce2bddb8d517349c89ba9a4e6c7f74f275f0c370c9/merged/proc/sys/net" cannot be mounted because it is inside /proc: unknown.

My only alternative to --privileged is granting CAP_SYS_ADMIN (for mount(2)) and remounting /proc/sys inside the container. This is a horrible alternative because:

  • CAP_SYS_ADMIN is terribly overloaded
  • /proc/sys has lots of kernel global options which aren't namespaced

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions