Skip to content

Add creating status to ensure state.json exists when runc kill. #4645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions libcontainer/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ const (
Paused
// Stopped is the status that denotes the container does not have a created or running process.
Stopped
// Creating is the status that denotes the container is creating.
Creating
)

func (s Status) String() string {
Expand All @@ -34,6 +36,8 @@ func (s Status) String() string {
return "paused"
case Stopped:
return "stopped"
case Creating:
return "creating"
default:
return "unknown"
}
Expand Down
4 changes: 3 additions & 1 deletion libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,6 @@ func (c *Container) newInitProcess(p *Process, cmd *exec.Cmd, comm *processComm)
},
intelRdtManager: c.intelRdtManager,
}
c.initProcess = init
return init, nil
}

Expand Down Expand Up @@ -883,6 +882,9 @@ func (c *Container) currentStatus() (Status, error) {
// out of process we need to verify the container's status based on runtime
// information and not rely on our in process info.
func (c *Container) refreshState() error {
if c.hasInit() && c.initProcess.pid() == -1 {
return c.state.transition(&creatingState{c: c})
}
paused, err := c.isPaused()
if err != nil {
return err
Expand Down
7 changes: 7 additions & 0 deletions libcontainer/process_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,13 @@ func (p *initProcess) start() (retErr error) {
return fmt.Errorf("unable to start init: %w", err)
}

// SIGKILL may happen at any time, so generate state.json here
_, err = p.container.updateState(nil)
if err != nil {
return fmt.Errorf("unable to store init state before creating cgroup: %w", err)
}
p.container.initProcess = p

defer func() {
if retErr != nil {
// Find out if init is killed by the kernel's OOM killer.
Expand Down
17 changes: 17 additions & 0 deletions libcontainer/state_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,20 @@ func (n *loadedState) destroy() error {
}
return n.c.state.destroy()
}

type creatingState struct {
Copy link
Member

@rata rata Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This state is awfully similar to the created state. I haven't had a deep look into this, but any reason to not use the created state?

I think if we remove the c.Init assignment as this PR does, and we run the p.container.UpdateState() as this PR also does, maybe it will also fix the problem?

It seems in that case, the refreshstate will return the created state, the only difference is that the destroy function is _ = i.c.initProcess.signal(unix.SIGKILL). If that is a problem, maybe the signalAllProcesses can work for the other things in the created state too?

Am I missing something?

c *Container
}

func (i *creatingState) status() Status {
return Creating
}

func (i *creatingState) transition(s containerState) error {
return newStateTransitionError(i, s)
}

func (i *creatingState) destroy() error {
_ = signalAllProcesses(i.c.cgroupManager, unix.SIGKILL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong.

  1. There are no processes in the cgroup yet, why are you killing them?
  2. We should only use signalAllProcesses for cases when the container doesn't have its own PID namespace.

Copy link
Author

@jianghao65536 jianghao65536 Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{6B191454-9517-436D-843E-F302279AF7BF}

Between the creation of cgroup and waitForChildExit, the pid of runc init[STAGE_PARENT], runc init[STAGE_CHILD], and runc init[STAGE_INIT] will exist in the cgroup.

runc delete does not know which process to send SIGKILL to. Considering the possible existence of processes like runc init PARENT, CHILD, etc., it is necessary to kill the processes in cgroup.procs one by one.

When I was contemplating about writing a function to terminate all processes in cgroup.procs, I discovered that signalAllProcesses already does exactly that, so I directly called it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Maybe add a comment, something like

// Use signalAllProcesses here because various stages of `runc init` might be in the cgroup.

return destroy(i.c)
}