Skip to content

Fix duplicate stumble count for parse errors

989ab76
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Merged

feat(benchmarks): Add Claude UI benchmark harness #427

Fix duplicate stumble count for parse errors
989ab76
Select commit
Loading
Failed to load commit list.
GitHub Actions / warden: xcodebuildmcp-test-boundary-review completed May 24, 2026 in 5m 21s

3 issues

xcodebuildmcp-test-boundary-review: Found 3 issues (1 high, 2 medium)

High

Unit test spawns real `python3` process without injection - `src/benchmarks/claude-ui/__tests__/claude-ui-benchmark.test.ts:22-39`

The runParserScript helper spawns a real python3 process via node:child_process directly, bypassing the safety setup's executor overrides; this test calls an actual external binary in the unit test run.

Medium

Tests use real OS filesystem because log-writer is not injected into `dismissFirstRunPrompts` - `src/benchmarks/claude-ui/__tests__/first-run-preflight.test.ts:24-29`

These tests create real temp directories and read actual files from disk to verify log output; the filesystem dependency should be injected (as logWriter) so tests can stay fully in-memory, consistent with the pattern used in prepareTemporarySimulator.

Benchmark test spawns real python3 subprocess, bypassing executor safety overrides

The runParserScript helper calls spawn('python3', args) directly from node:child_process, making npm test dependent on Python 3 being installed and bypassing the vitest-executor-safety.setup.ts framework-executor overrides; wrap the parser invocation in an injectable function so tests can stub it.


⏱ 4m 17s · 1.1M in / 44.8k out · $2.37

Annotations

Check failure on line 39 in src/benchmarks/claude-ui/__tests__/claude-ui-benchmark.test.ts

See this annotation in the file changed.

@github-actions github-actions / warden: xcodebuildmcp-test-boundary-review

Unit test spawns real `python3` process without injection

The `runParserScript` helper spawns a real `python3` process via `node:child_process` directly, bypassing the safety setup's executor overrides; this test calls an actual external binary in the unit test run.

Check warning on line 29 in src/benchmarks/claude-ui/__tests__/first-run-preflight.test.ts

See this annotation in the file changed.

@github-actions github-actions / warden: xcodebuildmcp-test-boundary-review

Tests use real OS filesystem because log-writer is not injected into `dismissFirstRunPrompts`

These tests create real temp directories and read actual files from disk to verify log output; the filesystem dependency should be injected (as `logWriter`) so tests can stay fully in-memory, consistent with the pattern used in `prepareTemporarySimulator`.