Commit f72a952
committed
fix: Wait directly in fakeroot MonitorContainer
When the user's container exits, fakeroot cleanup runs `rm` though the
fakeroot engine to remove a temporary container directory, extracted
from a SIF image, where necessary.
Under high load / parallelism, on some distros / kernels, execution of
fakeroot tempdir cleanup can become stuck. See #1109 for discussion of
a reproducer.
The fakeroot engine MonitorContainer code sits in a loop waiting for
signals. It requires a SIGCHLD to be received before it will wait4 on
the child process, to confirm it has exited.
Debug logging showed that sometimes a SIGCHLD was not received by
this code. It is not clear why, as `signal.Notify` is set early, and
the signals channel buffer is not being filled. The behavior varies
between distributions / kernels. It is sometimes easy to trigger, and
sometimes impossible.
As a workaround, re-structure the MonitorContainer function so that it
performs a blocking `wait4` in a go routine, separate from signal
handling. This ensures that receiving a `SIGCHLD` is not necessary to
identify that the child process has exited.
This fix has been verified in the CircleCI ssh environment. A reproducer
script that previously will freeze within a few iterations does not
freeze at all with this change.
Fixes #11091 parent 0cdc9c7 commit f72a952
File tree
2 files changed
+34
-16
lines changed- internal/pkg/runtime/engine/fakeroot
2 files changed
+34
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
260 | 270 | | |
261 | 271 | | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
269 | 288 | | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
279 | 294 | | |
| 295 | + | |
280 | 296 | | |
281 | 297 | | |
282 | 298 | | |
| |||
0 commit comments