Skip to content

Commit 7f8b5f1

Browse files
committed
feat: add MCP test runner support for run_tests and get_test_results
Adds MCP tools to control the Phoenix test runner remotely: run test suites by category/spec and poll structured results. Includes WS protocol handlers, test-runner-side MCP script, and updated CLAUDE.md with accurate suite naming guidance.
1 parent 0f1cc1e commit 7f8b5f1

File tree

7 files changed

+413
-1
lines changed

7 files changed

+413
-1
lines changed

CLAUDE.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,48 @@ Use `exec_js` to run JS in the Phoenix browser runtime. jQuery `$()` is global.
2424
**Click AI chat buttons:** `$('.ai-edit-restore-btn:contains("Undo")').click();`
2525

2626
**Check logs:** `get_browser_console_logs` with `filter` regex (e.g. `"AI UI"`, `"error"`) and `tail` — includes both browser console and Node.js (PhNode) logs. Use `get_terminal_logs` for Electron process output (only available if Phoenix was launched via `start_phoenix`).
27+
28+
## Running Tests via MCP
29+
30+
The test runner must be open as a separate Phoenix instance (it shows up as `phoenix-test-runner-*` in `get_phoenix_status`). Use `run_tests` to trigger test runs and `get_test_results` to poll for results. `take_screenshot` also works on the test runner.
31+
32+
### Test categories
33+
- **unit** — Fast, no UI. Safe to run all at once (`run_tests category=unit`).
34+
- **integration** — Spawns a Phoenix iframe inside the test runner. Some specs require window focus and will hang if the test runner window isn't focused.
35+
- **LegacyInteg** — Like integration but uses the legacy test harness. Also spawns an embedded Phoenix instance.
36+
- **livepreview**, **mainview** — Specialized integration tests.
37+
- **Do NOT use:** `all`, `performance`, `extension`, `individualrun` — not actively supported.
38+
39+
### Hierarchy: Category → Suite → Test
40+
- **Category** — top-level grouping: `unit`, `integration`, `LegacyInteg`, etc. Safe to run an entire category.
41+
- **Suite** — a group of related tests within a category (e.g. `integration: FileFilters` has ~20 tests). This is the `spec` parameter value.
42+
- **Test** — a single test within a suite.
43+
44+
### Running all tests in a category
45+
```
46+
run_tests(category="unit")
47+
```
48+
49+
### Running a single suite
50+
Pass the exact suite name as the `spec` parameter. **Suite names do NOT always have a category prefix.** Many suites are registered with just their plain name (e.g. `"CSS Parsing"`, `"Editor"`, `"JSUtils"`), while others include a prefix (e.g. `"unit:Phoenix Platform Tests"`, `"integration: FileFilters"`, `"LegacyInteg:ExtensionLoader"`). If the suite name is wrong, the test runner will show a blank page with 0 specs and appear stuck.
51+
52+
**To discover the exact suite name**, run this in `exec_js` on the test runner instance:
53+
```js
54+
return jasmine.getEnv().topSuite().children.map(s => s.description);
55+
```
56+
57+
Examples:
58+
```
59+
run_tests(category="unit", spec="CSS Parsing")
60+
run_tests(category="unit", spec="unit:Phoenix Platform Tests")
61+
run_tests(category="integration", spec="integration: FileFilters")
62+
run_tests(category="LegacyInteg", spec="LegacyInteg:ExtensionLoader")
63+
```
64+
65+
### Running individual tests
66+
You can pass a specific test's full name as `spec` to run just that one test. It is perfectly valid to run a single test. However, if a single test fails, re-run the full suite to confirm — suites sometimes execute tests in order with shared state, so an individual test may fail in isolation but pass within its suite. If the suite passes, the test is valid.
67+
68+
### Gotchas
69+
- **Instance name changes on reload:** The test runner gets a new random instance name each time the page reloads. Always check `get_phoenix_status` after a `run_tests` call to get the current instance name.
70+
- **Integration tests may hang:** Specs labeled "needs window focus" will hang indefinitely if the test runner doesn't have OS-level window focus. If `get_test_results` starts timing out, the event loop is likely blocked by a stuck spec — use `force_reload_phoenix` to recover.
71+
- **LegacyInteg/integration tests spawn an iframe:** These tests open an embedded Phoenix instance inside the test runner, so they are slower and more resource-intensive than unit tests.

phoenix-builder-mcp/mcp-tools.js

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,81 @@ export function registerTools(server, processManager, wsControlServer, phoenixDe
383383
}
384384
);
385385

386+
server.tool(
387+
"run_tests",
388+
"Run tests in the Phoenix test runner (SpecRunner.html). Reloads the test runner with the specified " +
389+
"category and optional spec filter. The test runner must already be open in a browser with MCP enabled. " +
390+
"Supported categories: unit, integration, LegacyInteg, livepreview, mainview. " +
391+
"WARNING: Do NOT use 'all', 'performance', 'extension', or 'individualrun' categories — they are " +
392+
"not actively supported and the full 'all' suite should never be run. " +
393+
"To run all tests in a category, omit the spec parameter. " +
394+
"To run a single suite, pass the suite name as spec (e.g. spec='unit: HTML Code Hinting'). " +
395+
"Suite names are prefixed with the category and a colon, e.g. 'unit: Editor', 'unit: CSS Parsing'. " +
396+
"You can also run individual specs by passing the full spec name, but note that individual specs " +
397+
"may fail when run alone because suites often run tests in order with shared state — prefer " +
398+
"running the full suite instead of individual specs. " +
399+
"After calling run_tests, use get_test_results to poll for results.",
400+
{
401+
category: z.string().describe("Test category to run: unit, integration, LegacyInteg, livepreview, or mainview."),
402+
spec: z.string().optional().describe("Optional suite or spec name to run within the category. " +
403+
"Use the full name including category prefix, e.g. 'unit: CSS Parsing' for a suite. " +
404+
"Prefer running full suites over individual specs, as specs may depend on suite execution order. " +
405+
"Omit to run all tests in the category."),
406+
instance: z.string().optional().describe("Target a specific test runner instance by name. Required when multiple instances are connected.")
407+
},
408+
async ({ category, spec, instance }) => {
409+
try {
410+
const result = await wsControlServer.requestRunTests(category, spec, instance);
411+
return {
412+
content: [{
413+
type: "text",
414+
text: JSON.stringify({
415+
success: true,
416+
message: result.message || "Test runner is reloading with category=" + category
417+
})
418+
}]
419+
};
420+
} catch (err) {
421+
return {
422+
content: [{
423+
type: "text",
424+
text: JSON.stringify({ error: err.message })
425+
}]
426+
};
427+
}
428+
}
429+
);
430+
431+
server.tool(
432+
"get_test_results",
433+
"Get structured test results from the Phoenix test runner. Returns running status, pass/fail counts, " +
434+
"failure details, and the currently executing spec. The test runner must already be open with MCP enabled.",
435+
{
436+
instance: z.string().optional().describe("Target a specific test runner instance by name. Required when multiple instances are connected.")
437+
},
438+
async ({ instance }) => {
439+
try {
440+
const result = await wsControlServer.requestTestResults(instance);
441+
// Remove internal WS fields
442+
delete result.type;
443+
delete result.id;
444+
return {
445+
content: [{
446+
type: "text",
447+
text: JSON.stringify(result, null, 2)
448+
}]
449+
};
450+
} catch (err) {
451+
return {
452+
content: [{
453+
type: "text",
454+
text: JSON.stringify({ error: err.message })
455+
}]
456+
};
457+
}
458+
}
459+
);
460+
386461
server.tool(
387462
"get_phoenix_status",
388463
"Check the status of the Phoenix process and WebSocket connection.",

phoenix-builder-mcp/ws-control-server.js

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,28 @@ export function createWSControlServer(port) {
109109
break;
110110
}
111111

112+
case "run_tests_response": {
113+
const pendingRt = pendingRequests.get(msg.id);
114+
if (pendingRt) {
115+
pendingRequests.delete(msg.id);
116+
if (msg.success) {
117+
pendingRt.resolve({ success: true, message: msg.message });
118+
} else {
119+
pendingRt.reject(new Error(msg.message || "run_tests failed"));
120+
}
121+
}
122+
break;
123+
}
124+
125+
case "get_test_results_response": {
126+
const pendingTr = pendingRequests.get(msg.id);
127+
if (pendingTr) {
128+
pendingRequests.delete(msg.id);
129+
pendingTr.resolve(msg);
130+
}
131+
break;
132+
}
133+
112134
case "reload_response": {
113135
const pending3 = pendingRequests.get(msg.id);
114136
if (pending3) {
@@ -390,6 +412,80 @@ export function createWSControlServer(port) {
390412
});
391413
}
392414

415+
function requestRunTests(category, spec, instanceName) {
416+
return new Promise((resolve, reject) => {
417+
const resolved = _resolveClient(instanceName);
418+
if (resolved.error) {
419+
reject(new Error(resolved.error));
420+
return;
421+
}
422+
423+
const { client } = resolved;
424+
if (client.ws.readyState !== 1) {
425+
reject(new Error("Phoenix client \"" + resolved.name + "\" is not connected"));
426+
return;
427+
}
428+
429+
const id = ++requestIdCounter;
430+
const timeout = setTimeout(() => {
431+
pendingRequests.delete(id);
432+
reject(new Error("run_tests request timed out (30s)"));
433+
}, 30000);
434+
435+
pendingRequests.set(id, {
436+
resolve: (data) => {
437+
clearTimeout(timeout);
438+
resolve(data);
439+
},
440+
reject: (err) => {
441+
clearTimeout(timeout);
442+
reject(err);
443+
}
444+
});
445+
446+
const msg = { type: "run_tests_request", id, category };
447+
if (spec) {
448+
msg.spec = spec;
449+
}
450+
client.ws.send(JSON.stringify(msg));
451+
});
452+
}
453+
454+
function requestTestResults(instanceName) {
455+
return new Promise((resolve, reject) => {
456+
const resolved = _resolveClient(instanceName);
457+
if (resolved.error) {
458+
reject(new Error(resolved.error));
459+
return;
460+
}
461+
462+
const { client } = resolved;
463+
if (client.ws.readyState !== 1) {
464+
reject(new Error("Phoenix client \"" + resolved.name + "\" is not connected"));
465+
return;
466+
}
467+
468+
const id = ++requestIdCounter;
469+
const timeout = setTimeout(() => {
470+
pendingRequests.delete(id);
471+
reject(new Error("get_test_results request timed out (30s)"));
472+
}, 30000);
473+
474+
pendingRequests.set(id, {
475+
resolve: (data) => {
476+
clearTimeout(timeout);
477+
resolve(data);
478+
},
479+
reject: (err) => {
480+
clearTimeout(timeout);
481+
reject(err);
482+
}
483+
});
484+
485+
client.ws.send(JSON.stringify({ type: "get_test_results_request", id }));
486+
});
487+
}
488+
393489
function getBrowserLogs(sinceLast, instanceName) {
394490
const resolved = _resolveClient(instanceName);
395491
if (resolved.error) {
@@ -442,6 +538,8 @@ export function createWSControlServer(port) {
442538
requestLogs,
443539
requestExecJs,
444540
requestExecJsLivePreview,
541+
requestRunTests,
542+
requestTestResults,
445543
getBrowserLogs,
446544
clearBrowserLogs,
447545
isClientConnected,

src/phoenix-builder/phoenix-builder-boot.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,8 @@
9090
let name = sessionStorage.getItem(INSTANCE_NAME_KEY);
9191
if (!name) {
9292
const hex = Math.floor(Math.random() * 0x10000).toString(16).padStart(4, "0");
93-
name = "phoenix-" + _getPlatformTag() + "-" + hex;
93+
const prefix = window._phoenixBuilderNamePrefix || "phoenix";
94+
name = prefix + "-" + _getPlatformTag() + "-" + hex;
9495
sessionStorage.setItem(INSTANCE_NAME_KEY, name);
9596
}
9697
return name;

test/SpecRunner.html

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,10 @@
394394
}());
395395
</script>
396396

397+
<script>window._phoenixBuilderNamePrefix = "phoenix-test-runner";</script>
398+
<script src="../src/phoenix-builder/phoenix-builder-boot.js"></script>
399+
<script src="phoenix-test-runner-mcp.js"></script>
400+
397401
<script src="../src/phoenix/shell.js" type="module"></script>
398402
<script src="virtual-server-loader.js" type="module"></script>
399403
<script src="../src/node-loader.js" defer></script>

test/SpecRunner.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -484,6 +484,7 @@ define(function (require, exports, module) {
484484
// Create the reporter, which is really a model class that just gathers
485485
// spec and performance data.
486486
reporter = new UnitTestReporter(jasmineEnv, params.get("spec"), selectedCategories);
487+
window._unitTestReporter = reporter;
487488
SpecRunnerUtils.setUnitTestReporter(reporter);
488489

489490
// Optionally emit JUnit XML file for automated runs

0 commit comments

Comments
 (0)