Skip to content

Commit c2746dc

Browse files
Olivier Bonnaureclaude
andcommitted
fix: SIGKILL orphan group and clean up dead process workers on unexpected exit
Two fixes for surviving worker processes killing replacements: 1. Process exit monitor: when a process dies unexpectedly, kill its entire process group before failover. The main process is dead but workers (--workers N) may survive and interfere with replacements. 2. Orphan cleanup: always SIGKILL the orphan's process group after the port is free, instead of only when the port is still in use. Workers may have released the port but still be running shutdown handlers that kill the new process via shared app directory state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a583df7 commit c2746dc

File tree

2 files changed

+17
-13
lines changed

2 files changed

+17
-13
lines changed

src/app/deployment.rs

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -448,20 +448,18 @@ impl DeploymentManager {
448448
}
449449
}
450450

451-
if self.check_port_in_use(port).await {
452-
let _ = tokio::process::Command::new("kill")
453-
.arg("-9")
454-
.arg("--")
455-
.arg(&pgid)
456-
.output()
457-
.await;
458-
}
451+
// Always SIGKILL the orphan group after port is free —
452+
// the main process may have released the port but worker
453+
// children can still be alive running shutdown handlers
454+
// that interfere with the replacement process.
455+
let _ = tokio::process::Command::new("kill")
456+
.arg("-9")
457+
.arg("--")
458+
.arg(&pgid)
459+
.output()
460+
.await;
459461

460-
// Wait for the orphan's entire process GROUP to fully
461-
// exit, not just the main PID. With workers, the main
462-
// process dies quickly but children may still be running
463-
// their shutdown handlers (e.g. cleaning up PID/lock
464-
// files) which can kill the replacement process.
462+
// Wait for the entire process group to be gone
465463
for _ in 0..20 {
466464
sleep(Duration::from_millis(100)).await;
467465
let alive = tokio::process::Command::new("kill")

src/app/mod.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1517,6 +1517,12 @@ impl AppManager {
15171517
};
15181518

15191519
if should_failover {
1520+
// Kill the dead process's entire group to clean up
1521+
// any surviving worker processes. The main process is
1522+
// dead but workers (started via --workers N) may still
1523+
// be running and could interfere with the replacement.
1524+
kill_process_group(exit.pid).await;
1525+
15201526
// Clear the dead PID so health checks and routing
15211527
// know this slot is gone
15221528
{

0 commit comments

Comments
 (0)