feat: Workspace filesystem cleanup #391
6 issues
find-bugs: Found 6 issues (2 medium, 4 low)
Medium
Forced-shutdown timeout is never cleared, causing concurrent cleanup and wrong exit code - `src/daemon.ts:344-351`
The setTimeout at line 344 is never cancelled when server.close() completes successfully. If the server closes (line 334) but the subsequent cleanupArtifacts() + flushAndCloseSentry(2000) chain takes longer than 5 seconds (the Sentry flush alone allows up to 2s), the timeout fires and invokes cleanupArtifacts() a second time concurrently with the in-flight cleanup, then calls process.exit(1). This can (a) race two cleanup runs against the same workspace-keyed artifacts/locks, and (b) override the intended exitCode with 1, masking clean shutdowns as failures.
Also found at:
src/cli/daemon-control.ts:53-57
Daemon socket moved to shared tmpdir without guaranteed directory permissions - `src/daemon/socket-path.ts:21-36`
daemonRunDir() now returns os.tmpdir(), and daemonDirForWorkspaceKey() places the daemon socket under tmpdir()/xcodebuildmcp-<hash>/d.sock. On Linux this resolves to /tmp, which is world-writable and shared across users. The socket directory is created in ensureSocketDir() with mkdirSync({recursive: true, mode: 0o700}), but recursive mkdir does not change permissions on a pre-existing directory. A local attacker who guesses or enumerates the workspace hash can pre-create /tmp/xcodebuildmcp-<hash> with looser permissions before the daemon starts; the daemon will then bind its socket inside an attacker-controlled directory, enabling socket hijacking, MITM of daemon RPC, or command injection through the daemon protocol. The previous implementation placed daemon artifacts under ~/.xcodebuildmcp (line 5 of the pre-diff), which inherited the home directory's restrictive permissions and was not subject to this race.
Also found at:
src/daemon/socket-path.ts:33-36
Low
isPidAlive treats pid<=0 as live due to process.kill process-group semantics - `src/utils/process-liveness.ts:1-8`
process.kill(0, 0) and process.kill(-N, 0) have special semantics on Unix: pid 0 targets the caller's process group and negative pids target a process group, which will typically succeed and cause isPidAlive to return true for invalid/sentinel pids. Some callers (e.g. src/daemon/daemon-registry.ts:219, src/utils/fs-lock.ts:93, src/utils/log-capture/simulator-launch-oslog-sessions.ts:85) do not pre-validate pid > 0, so a corrupted or zero-valued pid record would be considered live and prevent cleanup of stale artifacts — undermining the workspace cleanup ownership boundaries this PR introduces.
scheduleWorkspaceFilesystemLifecycleSweep records schedule timestamp before checking running set - `src/utils/workspace-filesystem-lifecycle.ts:411-419`
In scheduleWorkspaceFilesystemLifecycleSweep, lastScheduledAtByScope and lastScheduledAtByPreKey are only updated after the runningScheduledSweeps.has(scheduleKey) early return. That part is fine, but if a sweep is already running, subsequent calls will keep hitting the running-set early return without ever updating the cooldown timestamp; once the running sweep finishes, the next call will schedule another sweep immediately because the cooldown was never advanced. This can cause back-to-back sweeps under load instead of the intended cooldown spacing.
cleanupOwnedWorkspaceFilesystemArtifacts never prunes logs and bypasses lock/cooldown - `src/utils/workspace-filesystem-lifecycle.ts:478-504`
cleanupOwnedWorkspaceFilesystemArtifacts returns a result with scanned: 0, deleted: 0 and does not invoke pruneKnownLogDirectory or acquire the workspace fs lock. If the caller relies on this shutdown path to also enforce log retention (as the PR description implies), expired/excess logs will accumulate because shutdown skips the prune step entirely. Additionally, daemon cleanup runs without the shared lock, which could race with another process performing a sweep.
Pre-key cooldown check uses caller-supplied now without resolveOptions normalization - `src/utils/workspace-filesystem-lifecycle.ts:395-405`
buildSchedulePreKey/pre-key cooldown uses options.now ?? Date.now() before resolveOptions runs. If a caller passes a now that resolveOptions would normalize differently (e.g., default), the pre-key cooldown comparison can disagree with the post-resolve scope cooldown comparison, allowing a sweep to be scheduled that would otherwise be cooled down (or vice versa). The two cooldown gates can produce inconsistent decisions for the same call.
Duration: 14m 45s · Tokens: 1.3M in / 23.8k out · Cost: $4.48 (+extraction: $0.00, +merge: $0.00)
Annotations
Check warning on line 351 in src/daemon.ts
sentry-warden / warden: find-bugs
Forced-shutdown timeout is never cleared, causing concurrent cleanup and wrong exit code
The `setTimeout` at line 344 is never cancelled when `server.close()` completes successfully. If the server closes (line 334) but the subsequent `cleanupArtifacts()` + `flushAndCloseSentry(2000)` chain takes longer than 5 seconds (the Sentry flush alone allows up to 2s), the timeout fires and invokes `cleanupArtifacts()` a second time concurrently with the in-flight cleanup, then calls `process.exit(1)`. This can (a) race two cleanup runs against the same workspace-keyed artifacts/locks, and (b) override the intended `exitCode` with `1`, masking clean shutdowns as failures.
Check warning on line 57 in src/cli/daemon-control.ts
sentry-warden / warden: find-bugs
[7EB-RM7] Forced-shutdown timeout is never cleared, causing concurrent cleanup and wrong exit code (additional location)
The `setTimeout` at line 344 is never cancelled when `server.close()` completes successfully. If the server closes (line 334) but the subsequent `cleanupArtifacts()` + `flushAndCloseSentry(2000)` chain takes longer than 5 seconds (the Sentry flush alone allows up to 2s), the timeout fires and invokes `cleanupArtifacts()` a second time concurrently with the in-flight cleanup, then calls `process.exit(1)`. This can (a) race two cleanup runs against the same workspace-keyed artifacts/locks, and (b) override the intended `exitCode` with `1`, masking clean shutdowns as failures.
Check warning on line 36 in src/daemon/socket-path.ts
sentry-warden / warden: find-bugs
Daemon socket moved to shared tmpdir without guaranteed directory permissions
daemonRunDir() now returns os.tmpdir(), and daemonDirForWorkspaceKey() places the daemon socket under tmpdir()/xcodebuildmcp-<hash>/d.sock. On Linux this resolves to /tmp, which is world-writable and shared across users. The socket directory is created in ensureSocketDir() with mkdirSync({recursive: true, mode: 0o700}), but recursive mkdir does not change permissions on a pre-existing directory. A local attacker who guesses or enumerates the workspace hash can pre-create /tmp/xcodebuildmcp-<hash> with looser permissions before the daemon starts; the daemon will then bind its socket inside an attacker-controlled directory, enabling socket hijacking, MITM of daemon RPC, or command injection through the daemon protocol. The previous implementation placed daemon artifacts under ~/.xcodebuildmcp (line 5 of the pre-diff), which inherited the home directory's restrictive permissions and was not subject to this race.
Check warning on line 36 in src/daemon/socket-path.ts
sentry-warden / warden: find-bugs
[DDK-EYR] Daemon socket moved to shared tmpdir without guaranteed directory permissions (additional location)
daemonRunDir() now returns os.tmpdir(), and daemonDirForWorkspaceKey() places the daemon socket under tmpdir()/xcodebuildmcp-<hash>/d.sock. On Linux this resolves to /tmp, which is world-writable and shared across users. The socket directory is created in ensureSocketDir() with mkdirSync({recursive: true, mode: 0o700}), but recursive mkdir does not change permissions on a pre-existing directory. A local attacker who guesses or enumerates the workspace hash can pre-create /tmp/xcodebuildmcp-<hash> with looser permissions before the daemon starts; the daemon will then bind its socket inside an attacker-controlled directory, enabling socket hijacking, MITM of daemon RPC, or command injection through the daemon protocol. The previous implementation placed daemon artifacts under ~/.xcodebuildmcp (line 5 of the pre-diff), which inherited the home directory's restrictive permissions and was not subject to this race.