Skip to content

Replace syscall.Unlink with os.Remove so that the directory(e.g. /run… #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 27, 2020
Merged

Conversation

keloyang
Copy link
Contributor

Docker start failed If docker's unix socket path exist,and it is a dir.

we can reproduce it like thks.

  1. make sure live-restore is false
  2. start 20 containers with -v /run/docker.sock:/run/docker.sock and --restart always
#!/bin/bash

i=0
while [ ${i} -lt 20 ];
do
        docker run --name test-${i} -tid --restart always -v  /run/docker.sock:/run/docker.sock  busybox sleep 16881688
        ((i++))
done
  1. kill container process and restart dockerd
[root@centos1 workspace]# killall -9 sleep;usleep 100;systemctl restart docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@centos1 workspace]# ls /run/docker
docker/          dockershim.sock  docker.sock/   

we can see the dir /run/docker.sock/ is created and It causes docker to start failed.

use os.Remove can fix this issue.

…/docker.sock/) can be deleted

Signed-off-by: Shukui Yang <[email protected]>
Signed-off-by: Shukui Yang <[email protected]>
@keloyang
Copy link
Contributor Author

ping @calavera @offby1 @jamtur01 @vdemeester

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -79,7 +79,7 @@ func WithChmod(mask os.FileMode) SockOption {

// NewUnixSocketWithOpts creates a unix socket with the specified options
func NewUnixSocketWithOpts(path string, opts ...SockOption) (net.Listener, error) {
if err := syscall.Unlink(path); err != nil && !os.IsNotExist(err) {
if err := os.Remove(path); err != nil && !os.IsNotExist(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, so, this patch would not help getting into the situation (which is due to a race-condition between containers starting and the API being up), but would help recovering from that situation (by being able to remove the faulty directory instead of producing an error. Obviously, if there's containers still running that mount /var/run/docker.sock that would likely be problematic still 🤔

Change itself seems fine to me; if path is a directory it would attempt to remove it, which should be fairly safe, as it would fail if the directory is not empty, so there should be no real risk of accidentally removing files that we don't want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah @tiborvass thanks.

For creating the dir of /var/run/docker.sock, the root cause of this is that dockerd can still handle the API when it shutdown, at that time, dockerd create a container and share the host path /var/run/docker.sock, but it is not exist,so dockerd create it. I have a pr moby/moby#38241 abort this,PTAL again, thanks very much.

@tiborvass tiborvass merged commit fcb221c into docker:master Aug 27, 2020
@cpuguy83
Copy link
Contributor

Unlink does remove the file if it's not being used
I don't think this is correct and should be reverted.

@thaJeztah
Copy link
Member

@cpuguy83 you mean that the os.Remove would remove it in the non-directory situation?

@cpuguy83
Copy link
Contributor

The short term solution is to use --mount instead of "-v", because "--mount" will not attempt to create the directory.

@cpuguy83
Copy link
Contributor

Yes.

@thaJeztah
Copy link
Member

True; that would mostly prevent the situation. Once you're in there, things are more tricky (and something like this PR would help);

Should we check if it's a directory and in that case do a os.Remove() or syscall.Rmdir() ?

@cpuguy83
Copy link
Contributor

I think it is difficult to properly remediate it from here.
The Docker API should fail to start in this case.

The correct fix would likely be in Dockerd.
Couple of options I can think of:

  1. Have some restricted paths (like a list of unix sockets) that the volume manager can error out if it would have otherwise tried to create the path
  2. Block container startup on daemon init until after the API is ready

1 seems fairly problematic
2 may have some other consequences... may be worth adding in there to block API requests until after the daemon is ready (so block daemon until API is at a point where it can serve, but don't allow it to serve until the daemon is ready).

@cpuguy83
Copy link
Contributor

Or just the tried and true "don't do that" approach.

@thaJeztah
Copy link
Member

Opened a PR to revert the change #73, and an alternative (if we want to consider that; #74)

@keloyang
Copy link
Contributor Author

@cpuguy83 Thanks for your review.

Unlink does remove the file if it's not being used

Does this mean that Unlink does't remove the file if it's being used?

I use -v and mount /run/docker.sock to a container,then I write some code to use Unlink and try delete /run/docker.sock. At last /run/docker.sock is deleted successfully.

I want to know if I misunderstood something, PTAL, thanks very much.

the follownig is the code

package main

import "syscall"

func main() {
	syscall.Unlink("/run/docker.sock")
}

And the details

[root@VM_0_16_centos keloyang]# docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:25:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:24:18 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
[root@VM_0_16_centos keloyang]# ps -eaf|grep dockerd
root     27615     1  0 09:46 ?        00:00:00 /usr/bin/dockerd --containerd=/run/containerd/containerd.sock
root     27853 22441  0 09:46 pts/0    00:00:00 grep --color=auto dockerd
[root@VM_0_16_centos keloyang]# docker run --name sock -tid -v /run/docker.sock:/run/docker.sock busybox ash
2ac196c14d2af47178595c6d0eef44fdddce605bb6bf531a5b6c5eaaa247c7a9
[root@VM_0_16_centos keloyang]# lsof|grep docker.sock
systemd       1                 root   26u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615                 root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615                 root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27616           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27616           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27617           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27617           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27618           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27618           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27619           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27619           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27620           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27620           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27622           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27622           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27623           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27623           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27624           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27624           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27625           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27625           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27626           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27626           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
[root@VM_0_16_centos keloyang]# cat test.go 
package main

import "syscall"

func main() {
	syscall.Unlink("/run/docker.sock")
}
[root@VM_0_16_centos keloyang]# go build test.go
[root@VM_0_16_centos keloyang]# ls /run/docker.*
/run/docker.pid  /run/docker.sock
[root@VM_0_16_centos keloyang]# ./test 
[root@VM_0_16_centos keloyang]# ls /run/docker.*
/run/docker.pid
[root@VM_0_16_centos keloyang]# lsof |grep docker.sock
systemd       1                 root   26u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615                 root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615                 root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27616           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27616           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27617           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27617           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27618           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27618           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27619           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27619           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27620           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27620           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27622           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27622           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27623           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27623           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27624           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27624           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27625           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27625           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
dockerd   27615 27626           root    3u     unix 0xffff978dfc49e400       0t0   10958867 /var/run/docker.sock
dockerd   27615 27626           root    6u     unix 0xffff978dfcc1dc00       0t0   10959050 /var/run/docker.sock
[root@VM_0_16_centos keloyang]# docker info
Client:
 Debug Mode: false

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants