Skip to content

Run Claude and Codex ACP adapters in VM containers#996

Open
shivammittal274 wants to merge 1 commit into
devfrom
claude-vm
Open

Run Claude and Codex ACP adapters in VM containers#996
shivammittal274 wants to merge 1 commit into
devfrom
claude-vm

Conversation

@shivammittal274
Copy link
Copy Markdown
Contributor

Summary

  • replace Claude and Codex host-process ACP runtimes with VM-backed container runtimes
  • add shared best-effort container runtime startup for Claude, Codex, and Hermes while keeping app startup non-blocking
  • expose managed runtime terminal targets and adapter readiness handling in the agent UI

Test Plan

  • bun test apps/agent/entrypoints/app/agents/NewAgentDialog.test.ts apps/server/tests/api/routes/terminal-protocol.test.ts apps/server/tests/api/routes/terminal.test.ts apps/server/tests/lib/agents/acpx-runtime-context.test.ts apps/server/tests/lib/agents/acpx-runtime.test.ts apps/server/tests/lib/agents/runtime/claude-container-runtime.test.ts apps/server/tests/lib/agents/runtime/codex-container-runtime.test.ts apps/server/tests/lib/agents/runtime/container-agent-runtime.test.ts apps/server/tests/lib/agents/runtime/hermes-container-runtime.test.ts apps/server/tests/lib/container/managed/managed-container.test.ts apps/server/tests/lib/vm/vm-runtime.test.ts apps/server/tests/main.test.ts
  • bun run --cwd apps/agent typecheck
  • bunx biome check \
  • git diff --check

Follow-up

  • Existing agents with stale ACP backend session IDs after VM/container reset can still fail to resume; leaving that for the next debug pass as discussed.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Tests failed — 4/1265 failed

Suite Passed Failed Skipped
agent 82/82 0 0
build 9/9 0 0
eval 93/93 0 0
server-agent 266/266 0 0
server-api 206/209 3 0
server-browser 4/4 0 0
server-integration 9/10 0 1
server-lib 266/266 0 0
server-root 61/64 0 3
server-skills 31/31 0 0
server-tools 230/231 1 0
Failed tests
  • server-apiAgentHarnessService > writes a per-agent Hermes config.yaml + .env when adapter=hermes and provider config complete
  • server-apiAgentHarnessService > writes provider:custom + base_url for openai-compatible providers
  • server-apiAgentHarnessService > falls back to OpenAI default base_url for the openai provider type
  • server-toolsobservation tools > get_page_content returns markdown text

View workflow run

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR replaces the host-process ACP runtimes for Claude and Codex with VM-backed container runtimes (matching the existing Hermes pattern), adds a shared startContainerRuntimeBestEffort helper, and exposes multi-target terminal support in the agent UI so users can open a shell into any managed container.

  • New container runtimes: ClaudeRuntime and CodexRuntime extend ContainerAgentRuntime, run inside a Lima VM using node:20-bookworm-slim, and wait up to 120 s for npm install to complete via a new waitForReadinessProbe retry loop in ManagedContainer.
  • Terminal targets: the /terminal/targets endpoint and WebSocket now accept a target + agentId query pair, resolving to the correct container and per-agent working directory for openclaw, claude, codex, or hermes.
  • VM deduplication: VmRuntime.ensureReady is now serialised via a module-level promise map so concurrent Claude/Codex/Hermes startup requests share a single VM boot rather than racing.

Confidence Score: 4/5

Safe to merge with awareness of the reliability concerns around version-pinning and the 5-second polling cadence.

The core runtime migration is well-structured and mirrors the proven Hermes pattern. The VM deduplication fix is correct, the readiness-probe retry loop is necessary given the slow npm-install entrypoint, and error handling in the WebSocket open handler was correctly added. The main concerns are operational: both new container runtimes install latest packages on every restart risking version drift and slow cold starts, the /targets HTTP handler propagates subprocess errors as 500s rather than gracefully returning an empty list, the adapter health check now polls every 5 s unconditionally, and TerminalTargetId is duplicated across three files instead of living in the shared package.

codex-container-runtime.ts and claude-container-runtime.ts (start commands), terminal.ts (listRunningContainers error path), and useAgents.ts (polling interval).

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/lib/agents/runtime/claude-container-runtime.ts New file: migrates Claude from host-process to VM container runtime; uses npm install -g @latest on every container start which risks version drift between restarts
packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts New file: migrates Codex to VM container runtime; runs apt-get update + @latest npm installs on every restart, increasing startup latency and risking version drift
packages/browseros-agent/apps/server/src/api/routes/terminal.ts Adds /targets endpoint and parameterised WebSocket target; WS open handler correctly catches errors; listRunningContainers errors in the HTTP handler are unhandled and produce a 500
packages/browseros-agent/apps/server/src/api/services/terminal/terminal-session.ts Adds multi-target terminal support (openclaw/claude/codex/hermes) with agentId-scoped working dirs; mkdirSync used synchronously in WS open handler
packages/browseros-agent/apps/server/src/lib/vm/vm-runtime.ts Adds module-level deduplication for concurrent ensureReady calls via a shared promise map; correct cleanup in finally block
packages/browseros-agent/apps/server/src/lib/container/managed/managed-container.ts Replaces single-shot readinessProbe() call with a retry loop (waitForReadinessProbe); correctly handles long npm-install startup windows for Claude/Codex
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentTerminal.tsx Adds target selector UI and per-agent WS parameters; TerminalTargetId and TerminalTargetOption types are redefined locally rather than shared with the server
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx Adds per-agent terminal launch and inferTerminalTarget helper; target inference falls back to fragile display-label string matching when adapter lookup misses
packages/browseros-agent/apps/server/src/main.ts Moves Claude/Codex runtime startup from eager to best-effort (non-blocking); adds stop calls for Claude/Codex on shutdown
packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts Adds unconditional 5-second polling for adapter health; runs regardless of whether container runtimes are active

Sequence Diagram

sequenceDiagram
    participant App as Application (main.ts)
    participant SCBE as startContainerRuntimeBestEffort
    participant CR as ClaudeRuntime / CodexRuntime / HermesRuntime
    participant VM as VmRuntime (ensureReady)
    participant Ctr as Container (nerdctl)
    participant UI as Agent UI

    App->>SCBE: configureClaudeRuntime()
    App->>SCBE: configureCodexRuntime()
    App->>SCBE: configureHermesRuntime()
    SCBE->>CR: executeAction install
    SCBE->>CR: executeAction start
    CR->>VM: ensureReady()
    Note over VM: module-level promise map deduplicates concurrent calls
    VM-->>CR: VM ready
    CR->>Ctr: nerdctl pull image
    CR->>Ctr: nerdctl run npm install then sleep
    CR->>Ctr: readinessProbe loop up to 120 s
    Ctr-->>CR: command -v claude/codex OK
    CR-->>App: runtime ready

    UI->>App: "GET /terminal/targets?agentId=X"
    App->>Ctr: ContainerCli.ps() running containers
    App-->>UI: targets array
    UI->>App: "WS /terminal/ws?target=claude&agentId=X"
    App->>Ctr: nerdctl exec -it -e HOME -w agentHome /bin/sh
    Ctr-->>UI: PTY stream
Loading
Prompt To Fix All With AI
Fix the following 5 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 5
packages/browseros-agent/apps/server/src/api/routes/terminal.ts:115-128
**Unhandled error from `listRunningContainers` causes 500 on `/targets`**

If the VM is unavailable or `limactl` fails, `deps.listRunningContainers()` throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to `runningContainers = undefined` (skipping the filter) or returns `{ targets: [] }` would be more resilient.

### Issue 2 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx:463-471
**`inferTerminalTarget` matches on display label strings**

The fallback path for resolving the terminal target compares `runtimeLabel.toLowerCase()` against hard-coded strings like `'claude code'`. If a display name is updated elsewhere (e.g., in `RuntimeDescriptor.displayName`), this silently returns `null` and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the `adapter` field from `harnessAgentLookup` exclusively, or derive a stable adapter ID that doesn't depend on UI labels.

### Issue 3 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentTerminal.tsx:155
**`TerminalTargetId` and `TerminalTargetOption` are duplicated client-side**

`TerminalTargetId` is defined independently in `AgentTerminal.tsx`, `AgentsPage.tsx`, and on the server in `terminal-session.ts`. When a new target is added server-side, the two client-side type definitions and the `parseTargetId` guard must also be updated manually. Since these are already exported from the shared `@browseros/shared` package, consider exporting the type (or a plain constant array of valid IDs) from there to keep a single source of truth.

### Issue 4 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts:72-73
**Unconditional 5-second adapter health polling**

`refetchInterval: 5_000` runs whenever the component is mounted with `enabled = true`, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke `ContainerCli.ps()` (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy.

### Issue 5 of 5
packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts:50-52
**`apt-get update` and `@latest` npm installs run on every container restart**

`CODEX_START_COMMAND` runs `apt-get update` and installs `@openai/codex@latest` each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly `CLAUDE_CODE_START_COMMAND` pins `@anthropic-ai/claude-code@latest`, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible.

Reviews (1): Last reviewed commit: "feat(agent): run claude and codex in VM ..." | Re-trigger Greptile

Comment on lines +115 to +128
.get('/targets', async (c) => {
let runningContainers: Set<string> | undefined
if (deps.listRunningContainers) {
runningContainers = new Set(await deps.listRunningContainers())
}
return c.json({
targets: listTerminalTargets({
browserosDir: deps.browserosDir,
agentId: c.req.query('agentId'),
runningContainers,
openclawContainerName: deps.containerName,
}),
})
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unhandled error from listRunningContainers causes 500 on /targets

If the VM is unavailable or limactl fails, deps.listRunningContainers() throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to runningContainers = undefined (skipping the filter) or returns { targets: [] } would be more resilient.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/api/routes/terminal.ts
Line: 115-128

Comment:
**Unhandled error from `listRunningContainers` causes 500 on `/targets`**

If the VM is unavailable or `limactl` fails, `deps.listRunningContainers()` throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to `runningContainers = undefined` (skipping the filter) or returns `{ targets: [] }` would be more resilient.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 463 to +471
</div>
)
}

function inferTerminalTarget(label: string): TerminalTargetId | null {
const lower = label.toLowerCase()
if (lower === 'claude code') return 'claude'
if (lower === 'codex') return 'codex'
if (lower === 'hermes') return 'hermes'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 inferTerminalTarget matches on display label strings

The fallback path for resolving the terminal target compares runtimeLabel.toLowerCase() against hard-coded strings like 'claude code'. If a display name is updated elsewhere (e.g., in RuntimeDescriptor.displayName), this silently returns null and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the adapter field from harnessAgentLookup exclusively, or derive a stable adapter ID that doesn't depend on UI labels.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
Line: 463-471

Comment:
**`inferTerminalTarget` matches on display label strings**

The fallback path for resolving the terminal target compares `runtimeLabel.toLowerCase()` against hard-coded strings like `'claude code'`. If a display name is updated elsewhere (e.g., in `RuntimeDescriptor.displayName`), this silently returns `null` and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the `adapter` field from `harnessAgentLookup` exclusively, or derive a stable adapter ID that doesn't depend on UI labels.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 72 to +73
enabled: Boolean(baseUrl) && !urlLoading && enabled,
refetchInterval: enabled ? 5_000 : false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconditional 5-second adapter health polling

refetchInterval: 5_000 runs whenever the component is mounted with enabled = true, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke ContainerCli.ps() (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts
Line: 72-73

Comment:
**Unconditional 5-second adapter health polling**

`refetchInterval: 5_000` runs whenever the component is mounted with `enabled = true`, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke `ContainerCli.ps()` (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +50 to +52
export interface CodexRuntimeConfig {
browserosDir: string
codexHarnessHostDir: string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 apt-get update and @latest npm installs run on every container restart

CODEX_START_COMMAND runs apt-get update and installs @openai/codex@latest each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly CLAUDE_CODE_START_COMMAND pins @anthropic-ai/claude-code@latest, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts
Line: 50-52

Comment:
**`apt-get update` and `@latest` npm installs run on every container restart**

`CODEX_START_COMMAND` runs `apt-get update` and installs `@openai/codex@latest` each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly `CLAUDE_CODE_START_COMMAND` pins `@anthropic-ai/claude-code@latest`, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant