Skip to content

Fix fallback workspace key derivation in forceStopDaemon

699fe46
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Merged

feat: Workspace filesystem cleanup #391

Fix fallback workspace key derivation in forceStopDaemon
699fe46
Select commit
Loading
Failed to load commit list.
GitHub Actions / warden: find-bugs completed May 4, 2026 in 16m 53s

8 issues

find-bugs: Found 8 issues (4 medium, 4 low)

Medium

forceStopDaemon silently leaves stale socket and registry when SIGTERM target hasn't exited within 500ms - `src/cli/daemon-control.ts:53-57`

After sending SIGTERM and waiting 500ms, forceStopDaemon calls cleanupWorkspaceDaemonFiles with pid set but without allowLiveOwner=true. In canRemoveRegistryEntry, when allowLiveOwner is not true the function returns !isPidAlive(entry.pid); if the daemon hasn't yet exited (500ms is short for a graceful SIGTERM shutdown), removeRegistryAtPathIfOwned returns null and neither the registry entry nor the socket is removed. The previous implementation unconditionally called removeStaleSocket(socketPath), so this is a behavioral regression: ensureDaemonRunning will then proceed to startDaemonBackground while the old socket file still exists, which can cause the new daemon's bind() to fail with EADDRINUSE and leave the workspace unable to start a daemon.

Also found at:

  • src/daemon/daemon-registry.ts:332-344
Forced-shutdown timer can hang indefinitely waiting on async cleanup - `src/daemon.ts:344-351`

The 5s forced-shutdown timer previously called the synchronous cleanupWorkspaceDaemonFiles and then exited. It now awaits the async cleanupOwnedWorkspaceFilesystemArtifacts via .finally() with no timeout. If that promise hangs (e.g., a stuck filesystem lock acquired in fs-lock-shared / workspace-filesystem-lifecycle), the daemon will never exit, defeating the purpose of the forced-shutdown path and causing process leaks across daemon restarts.

Also found at:

  • src/utils/workspace-filesystem-lifecycle.ts:403-410
Daemon socket directory moved to shared tmpdir without per-user isolation enables symlink/TOCTOU attacks - `src/daemon/socket-path.ts:21-31`

daemonRunDir() now returns os.tmpdir() (typically /tmp on Linux, a world-writable shared directory), and daemonDirForWorkspaceKey builds a predictable path xcodebuildmcp-<12-hex> under it. Because workspaceKeyForRoot is a deterministic SHA-256 of the workspace root path, any local user can predict the directory name and pre-create it (or create it as a symlink) before the daemon starts. The directory creator in ensureSocketDir uses existsSync+mkdirSync (TOCTOU) and will not enforce ownership/mode if the directory already exists, which can lead to the daemon binding its UNIX socket inside an attacker-controlled directory and removeStaleSocket unlinking attacker-chosen files. Previously the directory lived under ~/.xcodebuildmcp which was not shared between users.

Also found at:

  • src/daemon/socket-path.ts:16-19
PID reuse can prevent recovery of expired locks indefinitely - `src/utils/fs-lock.ts:74-82`

shouldRecoverLockDir refuses to recover an expired lock if isPidAlive(staleOwner.pid) returns true. On Linux/macOS, PIDs are recycled, so an unrelated long-running process inheriting the recorded PID will keep the expired lock un-recoverable forever. Every future acquirer for that workspace will fail to acquire the lock, producing a denial-of-service for cleanup/daemon operations. Consider also comparing process start time or the owner token before honoring isPidAlive.

Also found at:

  • src/utils/fs-lock.ts:195-220

Low

removeDaemonRegistryEntry and cleanupWorkspaceDaemonFiles silently swallow lock acquisition failures - `src/daemon/daemon-registry.ts:286-289`

Both removeDaemonRegistryEntry (line 286) and cleanupWorkspaceDaemonFiles (line 331) call withDaemonRegistryMutationLock and ignore its null return value. If the lock cannot be acquired within DAEMON_REGISTRY_LOCK_WAIT_MS (1s), cleanup silently no-ops with no error or log, contradicting writeDaemonRegistryEntry which throws in the same situation. This can leave stale daemon registry entries and socket files behind under contention, which is the exact failure mode the multi-process cleanup boundaries described in the PR aim to address.

Also found at:

  • src/daemon.ts:186-197
Successful lock recovery on the last retry attempt is discarded - `src/utils/fs-lock.ts:204-225`

In tryAcquireFsLock, the retry loop runs at most 2 attempts. If the second attempt's createLock fails with EEXIST and tryRecoverExpiredLockDir succeeds, the loop terminates without performing another createLock, so the recovery is wasted and the function returns null. Callers that could have acquired the lock instead see a spurious failure, increasing contention churn.

Rename failure leaves untracked OSLog file and possibly-live helper child - `src/utils/simulator-steps.ts:363-369`

When renameHelperLogPathOrThrow fails after the detached child has already been spawned, the function sends SIGTERM (best-effort) and throws. The outer catch closes the parent fd but never unlinks the original ownerpid log path, and registerSimulatorLaunchOsLogSession is never called. If the detached child survives the SIGTERM, it continues writing to an orphaned, untracked log file that the workspace cleanup logic cannot reconcile because no session was registered. This produces a resource/file leak that grows on each failure.

cleanupOwnedWorkspaceFilesystemArtifacts skips daemon cleanup when workspace key is unresolved - `src/utils/workspace-filesystem-lifecycle.ts:480-492`

When neither options.workspaceKey nor getRuntimeInstanceIfConfigured()?.workspaceKey is available, the function returns early with an 'unconfigured' result. However, this path also skips stopOwnedSimulatorLaunchOsLogSessions, which is workspace-agnostic and would otherwise stop all owned sessions. As a result, on shutdown/force-stop without a configured runtime, owned simulator OSLog helpers are left running rather than being terminated, contradicting the stated goal of cleaning up owned artifacts.


Duration: 16m 46s · Tokens: 1.4M in / 27.0k out · Cost: $5.35 (+extraction: $0.01, +merge: $0.01)

Annotations

Check warning on line 57 in src/cli/daemon-control.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

forceStopDaemon silently leaves stale socket and registry when SIGTERM target hasn't exited within 500ms

After sending SIGTERM and waiting 500ms, forceStopDaemon calls cleanupWorkspaceDaemonFiles with pid set but without allowLiveOwner=true. In canRemoveRegistryEntry, when allowLiveOwner is not true the function returns !isPidAlive(entry.pid); if the daemon hasn't yet exited (500ms is short for a graceful SIGTERM shutdown), removeRegistryAtPathIfOwned returns null and neither the registry entry nor the socket is removed. The previous implementation unconditionally called removeStaleSocket(socketPath), so this is a behavioral regression: ensureDaemonRunning will then proceed to startDaemonBackground while the old socket file still exists, which can cause the new daemon's bind() to fail with EADDRINUSE and leave the workspace unable to start a daemon.

Check warning on line 344 in src/daemon/daemon-registry.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

[LX2-SSG] forceStopDaemon silently leaves stale socket and registry when SIGTERM target hasn't exited within 500ms (additional location)

After sending SIGTERM and waiting 500ms, forceStopDaemon calls cleanupWorkspaceDaemonFiles with pid set but without allowLiveOwner=true. In canRemoveRegistryEntry, when allowLiveOwner is not true the function returns !isPidAlive(entry.pid); if the daemon hasn't yet exited (500ms is short for a graceful SIGTERM shutdown), removeRegistryAtPathIfOwned returns null and neither the registry entry nor the socket is removed. The previous implementation unconditionally called removeStaleSocket(socketPath), so this is a behavioral regression: ensureDaemonRunning will then proceed to startDaemonBackground while the old socket file still exists, which can cause the new daemon's bind() to fail with EADDRINUSE and leave the workspace unable to start a daemon.

Check warning on line 351 in src/daemon.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

Forced-shutdown timer can hang indefinitely waiting on async cleanup

The 5s forced-shutdown timer previously called the synchronous cleanupWorkspaceDaemonFiles and then exited. It now awaits the async cleanupOwnedWorkspaceFilesystemArtifacts via .finally() with no timeout. If that promise hangs (e.g., a stuck filesystem lock acquired in fs-lock-shared / workspace-filesystem-lifecycle), the daemon will never exit, defeating the purpose of the forced-shutdown path and causing process leaks across daemon restarts.

Check warning on line 410 in src/utils/workspace-filesystem-lifecycle.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

[FX3-GFT] Forced-shutdown timer can hang indefinitely waiting on async cleanup (additional location)

The 5s forced-shutdown timer previously called the synchronous cleanupWorkspaceDaemonFiles and then exited. It now awaits the async cleanupOwnedWorkspaceFilesystemArtifacts via .finally() with no timeout. If that promise hangs (e.g., a stuck filesystem lock acquired in fs-lock-shared / workspace-filesystem-lifecycle), the daemon will never exit, defeating the purpose of the forced-shutdown path and causing process leaks across daemon restarts.

Check warning on line 31 in src/daemon/socket-path.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

Daemon socket directory moved to shared tmpdir without per-user isolation enables symlink/TOCTOU attacks

`daemonRunDir()` now returns `os.tmpdir()` (typically `/tmp` on Linux, a world-writable shared directory), and `daemonDirForWorkspaceKey` builds a predictable path `xcodebuildmcp-<12-hex>` under it. Because `workspaceKeyForRoot` is a deterministic SHA-256 of the workspace root path, any local user can predict the directory name and pre-create it (or create it as a symlink) before the daemon starts. The directory creator in `ensureSocketDir` uses `existsSync`+`mkdirSync` (TOCTOU) and will not enforce ownership/mode if the directory already exists, which can lead to the daemon binding its UNIX socket inside an attacker-controlled directory and `removeStaleSocket` unlinking attacker-chosen files. Previously the directory lived under `~/.xcodebuildmcp` which was not shared between users.

Check warning on line 19 in src/daemon/socket-path.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

[V8F-EGD] Daemon socket directory moved to shared tmpdir without per-user isolation enables symlink/TOCTOU attacks (additional location)

`daemonRunDir()` now returns `os.tmpdir()` (typically `/tmp` on Linux, a world-writable shared directory), and `daemonDirForWorkspaceKey` builds a predictable path `xcodebuildmcp-<12-hex>` under it. Because `workspaceKeyForRoot` is a deterministic SHA-256 of the workspace root path, any local user can predict the directory name and pre-create it (or create it as a symlink) before the daemon starts. The directory creator in `ensureSocketDir` uses `existsSync`+`mkdirSync` (TOCTOU) and will not enforce ownership/mode if the directory already exists, which can lead to the daemon binding its UNIX socket inside an attacker-controlled directory and `removeStaleSocket` unlinking attacker-chosen files. Previously the directory lived under `~/.xcodebuildmcp` which was not shared between users.

Check warning on line 82 in src/utils/fs-lock.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

PID reuse can prevent recovery of expired locks indefinitely

shouldRecoverLockDir refuses to recover an expired lock if `isPidAlive(staleOwner.pid)` returns true. On Linux/macOS, PIDs are recycled, so an unrelated long-running process inheriting the recorded PID will keep the expired lock un-recoverable forever. Every future acquirer for that workspace will fail to acquire the lock, producing a denial-of-service for cleanup/daemon operations. Consider also comparing process start time or the owner token before honoring `isPidAlive`.

Check warning on line 220 in src/utils/fs-lock.ts

See this annotation in the file changed.

@github-actions github-actions / warden: find-bugs

[67B-Y65] PID reuse can prevent recovery of expired locks indefinitely (additional location)

shouldRecoverLockDir refuses to recover an expired lock if `isPidAlive(staleOwner.pid)` returns true. On Linux/macOS, PIDs are recycled, so an unrelated long-running process inheriting the recorded PID will keep the expired lock un-recoverable forever. Every future acquirer for that workspace will fail to acquire the lock, producing a denial-of-service for cleanup/daemon operations. Consider also comparing process start time or the owner token before honoring `isPidAlive`.