Skip to content

Conversation

@vibegui
Copy link
Contributor

@vibegui vibegui commented Jan 6, 2026

Summary by cubic

Adds auto-start for STDIO connections on startup and cleans up orphaned processes. Adds a 5-minute cooldown after spawn failures, improves event bus delivery reliability, and enables gateway subscriptions.

  • New Features

    • Auto-start via AUTO_START_CONNECTIONS (comma-separated titles). Waits for resetStdioConnectionPool, adds a short delay, then spawns using dangerouslyCreateSuperUserMCPProxy.
    • Orphan cleanup on startup: kills mesh-bridge, pilot, and any process on port 9999 from previous runs.
    • Event bus subscriptions via gateway: support subscriberId in EventSubscribe and pass MESH_CONNECTION_ID to STDIO servers.
  • Bug Fixes

    • Suppress “Method not found” (-32601) errors from listPrompts to reduce auto-start log noise.
    • Event bus worker: prevent missed notifications on concurrent processNow calls; add helpful logs.
    • Add 5-minute cooldown after STDIO spawn failures to avoid retry loops; return a clear “Spawn failed” error.

Written for commit d6da145. Summary will update on new commits.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

🧪 Benchmark

Should we run the MCP Gateway benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

Release Options

Should a new version be published when this PR is merged?

React with an emoji to vote on the release type:

Reaction Type Next Version
👍 Prerelease 1.1.2-alpha.1
🎉 Patch 1.1.2
❤️ Minor 1.2.0
🚀 Major 2.0.0

Current version: 1.1.1

Deployment

  • Deploy to production (triggers ArgoCD sync after Docker image is published)

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/mesh/src/api/app.ts">

<violation number="1" location="apps/mesh/src/api/app.ts:191">
P2: The `.catch()` on `resetStdioConnectionPool()` converts rejected promises to resolved ones, so `await poolResetPromise` will succeed even if pool reset failed. This defeats the stated purpose of waiting for completion before auto-start. Consider re-throwing after logging, or handling the error case explicitly in the auto-start IIFE.</violation>
</file>

<file name="apps/mesh/src/stdio/stable-transport.ts">

<violation number="1" location="apps/mesh/src/stdio/stable-transport.ts:374">
P1: Killing any process on port 9999 is dangerous. Unlike the `pkill` patterns above that match specific command lines, this kills ANY process on port 9999 without verification. Consider checking the process command line before killing, or using a more specific approach like `lsof -t -i:9999 | xargs -I{} sh -c &#39;ps -p {} -o args= | grep -q mesh-bridge &amp;&amp; kill -9 {}&#39;`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


// Also kill anything listening on port 9999 (Bridge WebSocket)
try {
const { stdout } = await execAsync(`lsof -t -i:9999 2>/dev/null || true`);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Killing any process on port 9999 is dangerous. Unlike the pkill patterns above that match specific command lines, this kills ANY process on port 9999 without verification. Consider checking the process command line before killing, or using a more specific approach like lsof -t -i:9999 | xargs -I{} sh -c 'ps -p {} -o args= | grep -q mesh-bridge && kill -9 {}'.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/stdio/stable-transport.ts, line 374:

<comment>Killing any process on port 9999 is dangerous. Unlike the `pkill` patterns above that match specific command lines, this kills ANY process on port 9999 without verification. Consider checking the process command line before killing, or using a more specific approach like `lsof -t -i:9999 | xargs -I{} sh -c &#39;ps -p {} -o args= | grep -q mesh-bridge &amp;&amp; kill -9 {}&#39;`.</comment>

<file context>
@@ -343,6 +343,51 @@ async function forceCloseAllStdioConnections(): Promise&lt;void&gt; {
+
+  // Also kill anything listening on port 9999 (Bridge WebSocket)
+  try {
+    const { stdout } = await execAsync(`lsof -t -i:9999 2&gt;/dev/null || true`);
+    const pids = stdout.trim().split(&quot;\n&quot;).filter(Boolean);
+    for (const pid of pids) {
</file context>

✅ Addressed in c662c2a

// Old processes have stale credentials, need fresh spawn with new tokens
resetStdioConnectionPool().catch((err) => {
// IMPORTANT: Track this promise so autoStart waits for it to complete
const poolResetPromise = resetStdioConnectionPool().catch((err) => {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The .catch() on resetStdioConnectionPool() converts rejected promises to resolved ones, so await poolResetPromise will succeed even if pool reset failed. This defeats the stated purpose of waiting for completion before auto-start. Consider re-throwing after logging, or handling the error case explicitly in the auto-start IIFE.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/api/app.ts, line 191:

<comment>The `.catch()` on `resetStdioConnectionPool()` converts rejected promises to resolved ones, so `await poolResetPromise` will succeed even if pool reset failed. This defeats the stated purpose of waiting for completion before auto-start. Consider re-throwing after logging, or handling the error case explicitly in the auto-start IIFE.</comment>

<file context>
@@ -129,7 +187,8 @@ export function createApp(options: CreateAppOptions = {}) {
   // Old processes have stale credentials, need fresh spawn with new tokens
-  resetStdioConnectionPool().catch((err) =&gt; {
+  // IMPORTANT: Track this promise so autoStart waits for it to complete
+  const poolResetPromise = resetStdioConnectionPool().catch((err) =&gt; {
     console.error(&quot;[StableStdio] Error resetting pool:&quot;, err);
   });
</file context>

✅ Addressed in c662c2a

@vibegui vibegui force-pushed the feat/stdio-auto-start-and-cleanup branch from de8fa15 to 51e399a Compare January 6, 2026 19:20
@vibegui vibegui force-pushed the feat/stdio-connection-support branch from 196adcb to ff65603 Compare January 8, 2026 15:21
@vibegui vibegui force-pushed the feat/stdio-auto-start-and-cleanup branch 4 times, most recently from b58f7b7 to ac0720f Compare January 8, 2026 15:35
@vibegui vibegui force-pushed the feat/stdio-connection-support branch 5 times, most recently from 1d3989b to af27cb0 Compare January 10, 2026 09:30
@vibegui vibegui force-pushed the feat/stdio-auto-start-and-cleanup branch from ac0720f to 9829cdf Compare January 10, 2026 09:50
…s logging

- Added functionality to automatically start STDIO connections based on the AUTO_START_CONNECTIONS environment variable.
- Introduced a new helper function, autoStartConnectionsByTitle, to manage the auto-start process.
- Enhanced EventBus with additional logging for event processing and subscription matching.
- Implemented a mechanism to kill orphaned STDIO processes to prevent stale connections.
- Updated error handling in the proxy to avoid logging unnecessary "Method not found" errors.

This commit improves the management of STDIO connections and provides better visibility into event bus operations.
@vibegui vibegui force-pushed the feat/stdio-auto-start-and-cleanup branch from 9829cdf to bf98523 Compare January 10, 2026 10:02
…out prompts

Error code -32601 is expected for MCPs that don't implement prompts/list.
Don't log these as errors since they're not actionable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants