Run Claude and Codex ACP adapters in VM containers by shivammittal274 · Pull Request #996 · browseros-ai/BrowserOS

shivammittal274 · 2026-05-11T21:14:41Z

Summary

replace Claude and Codex host-process ACP runtimes with VM-backed container runtimes
add shared best-effort container runtime startup for Claude, Codex, and Hermes while keeping app startup non-blocking
expose managed runtime terminal targets and adapter readiness handling in the agent UI

Test Plan

bun test apps/agent/entrypoints/app/agents/NewAgentDialog.test.ts apps/server/tests/api/routes/terminal-protocol.test.ts apps/server/tests/api/routes/terminal.test.ts apps/server/tests/lib/agents/acpx-runtime-context.test.ts apps/server/tests/lib/agents/acpx-runtime.test.ts apps/server/tests/lib/agents/runtime/claude-container-runtime.test.ts apps/server/tests/lib/agents/runtime/codex-container-runtime.test.ts apps/server/tests/lib/agents/runtime/container-agent-runtime.test.ts apps/server/tests/lib/agents/runtime/hermes-container-runtime.test.ts apps/server/tests/lib/container/managed/managed-container.test.ts apps/server/tests/lib/vm/vm-runtime.test.ts apps/server/tests/main.test.ts
bun run --cwd apps/agent typecheck
bunx biome check \
git diff --check

Follow-up

Existing agents with stale ACP backend session IDs after VM/container reset can still fail to resume; leaving that for the next debug pass as discussed.

github-actions · 2026-05-11T21:18:16Z

❌ Tests failed — 4/1265 failed

Suite	Passed	Failed	Skipped
✅ `agent`	82/82	0	0
✅ `build`	9/9	0	0
✅ `eval`	93/93	0	0
✅ `server-agent`	266/266	0	0
❌ `server-api`	206/209	3	0
✅ `server-browser`	4/4	0	0
✅ `server-integration`	9/10	0	1
✅ `server-lib`	266/266	0	0
✅ `server-root`	61/64	0	3
✅ `server-skills`	31/31	0	0
❌ `server-tools`	230/231	1	0

Failed tests

server-api — AgentHarnessService > writes a per-agent Hermes config.yaml + .env when adapter=hermes and provider config complete
server-api — AgentHarnessService > writes provider:custom + base_url for openai-compatible providers
server-api — AgentHarnessService > falls back to OpenAI default base_url for the openai provider type
server-tools — observation tools > get_page_content returns markdown text

View workflow run

greptile-apps · 2026-05-11T21:24:29Z

Greptile Summary

This PR replaces the host-process ACP runtimes for Claude and Codex with VM-backed container runtimes (matching the existing Hermes pattern), adds a shared startContainerRuntimeBestEffort helper, and exposes multi-target terminal support in the agent UI so users can open a shell into any managed container.

New container runtimes: ClaudeRuntime and CodexRuntime extend ContainerAgentRuntime, run inside a Lima VM using node:20-bookworm-slim, and wait up to 120 s for npm install to complete via a new waitForReadinessProbe retry loop in ManagedContainer.
Terminal targets: the /terminal/targets endpoint and WebSocket now accept a target + agentId query pair, resolving to the correct container and per-agent working directory for openclaw, claude, codex, or hermes.
VM deduplication: VmRuntime.ensureReady is now serialised via a module-level promise map so concurrent Claude/Codex/Hermes startup requests share a single VM boot rather than racing.

Confidence Score: 4/5

Safe to merge with awareness of the reliability concerns around version-pinning and the 5-second polling cadence.

The core runtime migration is well-structured and mirrors the proven Hermes pattern. The VM deduplication fix is correct, the readiness-probe retry loop is necessary given the slow npm-install entrypoint, and error handling in the WebSocket open handler was correctly added. The main concerns are operational: both new container runtimes install latest packages on every restart risking version drift and slow cold starts, the /targets HTTP handler propagates subprocess errors as 500s rather than gracefully returning an empty list, the adapter health check now polls every 5 s unconditionally, and TerminalTargetId is duplicated across three files instead of living in the shared package.

codex-container-runtime.ts and claude-container-runtime.ts (start commands), terminal.ts (listRunningContainers error path), and useAgents.ts (polling interval).

Important Files Changed

Filename	Overview
packages/browseros-agent/apps/server/src/lib/agents/runtime/claude-container-runtime.ts	New file: migrates Claude from host-process to VM container runtime; uses `npm install -g @latest` on every container start which risks version drift between restarts
packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts	New file: migrates Codex to VM container runtime; runs `apt-get update` + `@latest` npm installs on every restart, increasing startup latency and risking version drift
packages/browseros-agent/apps/server/src/api/routes/terminal.ts	Adds `/targets` endpoint and parameterised WebSocket target; WS open handler correctly catches errors; `listRunningContainers` errors in the HTTP handler are unhandled and produce a 500
packages/browseros-agent/apps/server/src/api/services/terminal/terminal-session.ts	Adds multi-target terminal support (openclaw/claude/codex/hermes) with agentId-scoped working dirs; `mkdirSync` used synchronously in WS open handler
packages/browseros-agent/apps/server/src/lib/vm/vm-runtime.ts	Adds module-level deduplication for concurrent `ensureReady` calls via a shared promise map; correct cleanup in `finally` block
packages/browseros-agent/apps/server/src/lib/container/managed/managed-container.ts	Replaces single-shot `readinessProbe()` call with a retry loop (`waitForReadinessProbe`); correctly handles long npm-install startup windows for Claude/Codex
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentTerminal.tsx	Adds target selector UI and per-agent WS parameters; `TerminalTargetId` and `TerminalTargetOption` types are redefined locally rather than shared with the server
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx	Adds per-agent terminal launch and `inferTerminalTarget` helper; target inference falls back to fragile display-label string matching when adapter lookup misses
packages/browseros-agent/apps/server/src/main.ts	Moves Claude/Codex runtime startup from eager to best-effort (non-blocking); adds stop calls for Claude/Codex on shutdown
packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts	Adds unconditional 5-second polling for adapter health; runs regardless of whether container runtimes are active

Sequence Diagram

sequenceDiagram
    participant App as Application (main.ts)
    participant SCBE as startContainerRuntimeBestEffort
    participant CR as ClaudeRuntime / CodexRuntime / HermesRuntime
    participant VM as VmRuntime (ensureReady)
    participant Ctr as Container (nerdctl)
    participant UI as Agent UI

    App->>SCBE: configureClaudeRuntime()
    App->>SCBE: configureCodexRuntime()
    App->>SCBE: configureHermesRuntime()
    SCBE->>CR: executeAction install
    SCBE->>CR: executeAction start
    CR->>VM: ensureReady()
    Note over VM: module-level promise map deduplicates concurrent calls
    VM-->>CR: VM ready
    CR->>Ctr: nerdctl pull image
    CR->>Ctr: nerdctl run npm install then sleep
    CR->>Ctr: readinessProbe loop up to 120 s
    Ctr-->>CR: command -v claude/codex OK
    CR-->>App: runtime ready

    UI->>App: "GET /terminal/targets?agentId=X"
    App->>Ctr: ContainerCli.ps() running containers
    App-->>UI: targets array
    UI->>App: "WS /terminal/ws?target=claude&agentId=X"
    App->>Ctr: nerdctl exec -it -e HOME -w agentHome /bin/sh
    Ctr-->>UI: PTY stream

Prompt To Fix All With AI

Fix the following 5 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 5
packages/browseros-agent/apps/server/src/api/routes/terminal.ts:115-128
**Unhandled error from `listRunningContainers` causes 500 on `/targets`**

If the VM is unavailable or `limactl` fails, `deps.listRunningContainers()` throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to `runningContainers = undefined` (skipping the filter) or returns `{ targets: [] }` would be more resilient.

### Issue 2 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx:463-471
**`inferTerminalTarget` matches on display label strings**

The fallback path for resolving the terminal target compares `runtimeLabel.toLowerCase()` against hard-coded strings like `'claude code'`. If a display name is updated elsewhere (e.g., in `RuntimeDescriptor.displayName`), this silently returns `null` and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the `adapter` field from `harnessAgentLookup` exclusively, or derive a stable adapter ID that doesn't depend on UI labels.

### Issue 3 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentTerminal.tsx:155
**`TerminalTargetId` and `TerminalTargetOption` are duplicated client-side**

`TerminalTargetId` is defined independently in `AgentTerminal.tsx`, `AgentsPage.tsx`, and on the server in `terminal-session.ts`. When a new target is added server-side, the two client-side type definitions and the `parseTargetId` guard must also be updated manually. Since these are already exported from the shared `@browseros/shared` package, consider exporting the type (or a plain constant array of valid IDs) from there to keep a single source of truth.

### Issue 4 of 5
packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts:72-73
**Unconditional 5-second adapter health polling**

`refetchInterval: 5_000` runs whenever the component is mounted with `enabled = true`, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke `ContainerCli.ps()` (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy.

### Issue 5 of 5
packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts:50-52
**`apt-get update` and `@latest` npm installs run on every container restart**

`CODEX_START_COMMAND` runs `apt-get update` and installs `@openai/codex@latest` each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly `CLAUDE_CODE_START_COMMAND` pins `@anthropic-ai/claude-code@latest`, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible.

_{Reviews (1): Last reviewed commit: "feat(agent): run claude and codex in VM ..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T21:24:32Z

+    .get('/targets', async (c) => {
+      let runningContainers: Set<string> | undefined
+      if (deps.listRunningContainers) {
+        runningContainers = new Set(await deps.listRunningContainers())
+      }
+      return c.json({
+        targets: listTerminalTargets({
+          browserosDir: deps.browserosDir,
+          agentId: c.req.query('agentId'),
+          runningContainers,
+          openclawContainerName: deps.containerName,
+        }),
+      })
+    })


Unhandled error from listRunningContainers causes 500 on /targets

If the VM is unavailable or limactl fails, deps.listRunningContainers() throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to runningContainers = undefined (skipping the filter) or returns { targets: [] } would be more resilient.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/api/routes/terminal.ts Line: 115-128 Comment: **Unhandled error from `listRunningContainers` causes 500 on `/targets`** If the VM is unavailable or `limactl` fails, `deps.listRunningContainers()` throws and the entire route handler propagates a 500. The frontend silently ignores non-OK responses, so users see a stale/empty target dropdown during VM startup with no server-side diagnostic logged. A try/catch that falls back to `runningContainers = undefined` (skipping the filter) or returns `{ targets: [] }` would be more resilient. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-11T21:24:33Z

    </div>
  )
 }
+
+function inferTerminalTarget(label: string): TerminalTargetId | null {
+  const lower = label.toLowerCase()
+  if (lower === 'claude code') return 'claude'
+  if (lower === 'codex') return 'codex'
+  if (lower === 'hermes') return 'hermes'


inferTerminalTarget matches on display label strings

The fallback path for resolving the terminal target compares runtimeLabel.toLowerCase() against hard-coded strings like 'claude code'. If a display name is updated elsewhere (e.g., in RuntimeDescriptor.displayName), this silently returns null and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the adapter field from harnessAgentLookup exclusively, or derive a stable adapter ID that doesn't depend on UI labels.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx Line: 463-471 Comment: **`inferTerminalTarget` matches on display label strings** The fallback path for resolving the terminal target compares `runtimeLabel.toLowerCase()` against hard-coded strings like `'claude code'`. If a display name is updated elsewhere (e.g., in `RuntimeDescriptor.displayName`), this silently returns `null` and the "Open terminal" action becomes a no-op without any user feedback. Prefer using the `adapter` field from `harnessAgentLookup` exclusively, or derive a stable adapter ID that doesn't depend on UI labels. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-11T21:24:34Z

    enabled: Boolean(baseUrl) && !urlLoading && enabled,
+    refetchInterval: enabled ? 5_000 : false,


Unconditional 5-second adapter health polling

refetchInterval: 5_000 runs whenever the component is mounted with enabled = true, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke ContainerCli.ps() (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/useAgents.ts Line: 72-73 Comment: **Unconditional 5-second adapter health polling** `refetchInterval: 5_000` runs whenever the component is mounted with `enabled = true`, regardless of whether any container-backed adapter is registered. Each poll triggers a server-side request; at the server this may also invoke `ContainerCli.ps()` (a subprocess). Consider polling only when at least one container runtime is known to be starting, or backing off once all adapters are healthy. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-11T21:24:35Z

+export interface CodexRuntimeConfig {
+  browserosDir: string
+  codexHarnessHostDir: string


apt-get update and @latest npm installs run on every container restart

CODEX_START_COMMAND runs apt-get update and installs @openai/codex@latest each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly CLAUDE_CODE_START_COMMAND pins @anthropic-ai/claude-code@latest, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/lib/agents/runtime/codex-container-runtime.ts Line: 50-52 Comment: **`apt-get update` and `@latest` npm installs run on every container restart** `CODEX_START_COMMAND` runs `apt-get update` and installs `@openai/codex@latest` each time the container starts. On a server restart the 120 s readiness timeout re-applies, and the installed package version may differ from the previous run. Similarly `CLAUDE_CODE_START_COMMAND` pins `@anthropic-ai/claude-code@latest`, making both runtimes susceptible to spontaneous version drift between restarts. Pinning to an explicit version (or using a baked image) would make the behaviour more reproducible. How can I resolve this? If you propose a fix, please make it concise.

feat(agent): run claude and codex in VM containers

985cdd3

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Claude and Codex ACP adapters in VM containers#996

Run Claude and Codex ACP adapters in VM containers#996
shivammittal274 wants to merge 1 commit into
devfrom
claude-vm

shivammittal274 commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		enabled: Boolean(baseUrl) && !urlLoading && enabled,
		refetchInterval: enabled ? 5_000 : false,

Conversation

shivammittal274 commented May 11, 2026

Summary

Test Plan

Follow-up

Uh oh!

github-actions Bot commented May 11, 2026

❌ Tests failed — 4/1265 failed

Uh oh!

greptile-apps Bot commented May 11, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant