Skip to content

[Bug]: nested docker breaks youki exec #3437

@stepancheg

Description

@stepancheg

Bug Description

Docker creates nested cgroups, enables controllers, so writing to root cgroup is no longer allowed. So youki exec fails.

Steps to Reproduce

  • youki spec and make cgroups writable
Details

{
  "ociVersion": "1.0.2-dev",
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620",
        "gid=5"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "mode=1777",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "relatime"
      ]
    }
  ],
  "process": {
    "terminal": false,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "sh"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "effective": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "inheritable": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "permitted": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "ambient": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ]
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "hostname": "youki",
  "annotations": {},
  "linux": {
    "resources": {
      "devices": []
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "cgroup"
      }
    ],
    "maskedPaths": [
      "/proc/acpi",
      "/proc/asound",
      "/proc/kcore",
      "/proc/keys",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug",
      "/sys/firmware",
      "/proc/scsi"
    ],
    "readonlyPaths": [
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

  • youki run a2
  • then
youki exec a2 sh -c '
set -e
mkdir /sys/fs/cgroup/x
echo 1 > /sys/fs/cgroup/x/cgroup.procs
echo $$ > /sys/fs/cgroup/x/cgroup.procs
echo +pids > /sys/fs/cgroup/cgroup.subtree_control
'

Works fine, then

youki exec a2 date
ERROR libcontainer::process::container_intermediate_process: failed to add task to cgroup pid=Pid(3948391) err=V2(WrappedIo(Write { err: Os { code: 16, kind: ResourceBusy, message: "Resource busy" }, path: "/sys/fs/cgroup/:youki:a2/cgroup.procs", data: "3948391" })) init=false
ERROR libcontainer::process::container_main_process: failed to run intermediate process cgroup error: io error: failed to write 3948391 to /sys/fs/cgroup/:youki:a2/cgroup.procs: Resource busy (os error 16)
ERROR libcontainer::container::builder_impl: failed to run container process intermediate process error cgroup error: io error: failed to write 3948391 to /sys/fs/cgroup/:youki:a2/cgroup.procs: Resource busy (os error 16)
ERROR youki: error in executing command: failed to create container: intermediate process error cgroup error: io error: failed to write 3948391 to /sys/fs/cgroup/:youki:a2/cgroup.procs: Resource busy (os error 16)
exec failed : failed to create container: intermediate process error cgroup error: io error: failed to write 3948391 to /sys/fs/cgroup/:youki:a2/cgroup.procs: Resource busy (os error 16)

Same sequence works with runc.

AFAIU, youki unconditionally writes to root cgroup, while instead it should find cgroup of init process, and write to it instead.

Expectation

No response

System and Setup Info

youki version: 0.5.7
commit: 0.5.7-bd54457ba9de1629f21fe678687f433aec63e7c7

6.14.0-1015-gcp #16-Ubuntu

Additional Context

https://github.com/opencontainers/runc/blob/a51b20a37bfb8095c22c1db7beb6bd4ff4b67706/libcontainer/process_linux.go#L302 this is how runc does it; this code also explains why it is done this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions