Skip to content

feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971

Open
Dani Akash (DaniAkash) wants to merge 16 commits into
feat/openclaw-runtimefrom
feat/runtime-control-ui
Open

feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971
Dani Akash (DaniAkash) wants to merge 16 commits into
feat/openclaw-runtimefrom
feat/runtime-control-ui

Conversation

@DaniAkash
Copy link
Copy Markdown
Contributor

Summary

Stacked on #970 (feat/openclaw-runtime). Lands the user-visible piece of the AgentRuntime architecture: a uniform /runtimes/<adapter>/* HTTP surface backed by runtime.executeAction(...) through AgentRuntimeRegistry, plus capability-gated UI components that consume it.

Server:

  • GET /runtimes — list all registered runtimes with descriptor + status snapshot + capabilities
  • GET /runtimes/:adapter/status — single runtime status
  • GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition + 15s heartbeat
  • POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction. Body schema picks up agentId for reset-wipe-agent. 405 if action not in capabilities; 400 on unknown action; 500 on action throw.
  • GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process)
  • All routes use zValidator for path/query/body so the typed RPC client (hc<AppType>) picks up the schemas.

UI:

  • useRuntime(adapter) / useRuntimeAction(adapter) / useRuntimeLogs(adapter) — generic React Query hooks backed by the typed RPC client. 5s default poll; mutations invalidate the status query on success.
  • <RuntimeStatusBar adapter='…'> replaces GatewayStatusBar. Compact one-line bar with state pill + optional Restart. extraPill and extraActions slots let openclaw add its control-plane pill and Open Terminal button without baking gateway specifics into the runtime layer.
  • <RuntimeControlPanel adapter='…'> replaces GatewayStateCards from OpenClawControls. Capability-gated state-appropriate primary CTA: not_installed → Install, stopped → Start, errored → Restart + Reset, installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI, running → optional Stop. extras slot for adapter-specific affordances (e.g. openclaw's provider Setup dialog trigger).
  • AgentsPage rewired to mount the new components. The 'Unavailable' badge in AgentSummaryChips.tsx deletes (capabilities-driven UI surfaces the signal more usefully on the new RuntimeControlPanel).
  • GatewayStatusBar.tsx deletes outright.
  • ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from OpenClawControls remain — they cover gateway-specific concerns the runtime layer doesn't model.

Out of scope (deferred follow-ups):

  • Deleting the legacy /claw/{status,start,stop,restart,logs} lifecycle routes — UI still polls /claw/status for control-plane info that lives outside the runtime registry. Will land once the control-plane surface is moved to the runtime layer (Phase 7+).
  • Slimming useOpenClaw.ts's lifecycle mutations — they're now a fallback, replaced by the new hooks at the call sites that matter.

Test plan

  • bun run typecheck clean across server + UI (pre-existing missing-generated-graphql errors aside)
  • biome check clean on touched files
  • 11 new server-side tests in tests/api/routes/runtimes.test.ts covering list/status/actions (capability gate, unknown action, agentId requirement, throw → 500) / logs (container vs host-process)
  • Full server test sweep — 1042 pass, 0 fail (one pre-existing ContainerCli flake also reproduces on plain origin/dev)
  • End-to-end UI verification by Dani — full openclaw lifecycle via the new RuntimeStatusBar + RuntimeControlPanel before merging this stack

Uniform HTTP surface backed by AgentRuntimeRegistry + runtime.executeAction:
- GET /runtimes — list all registered runtimes (descriptor + status + capabilities)
- GET /runtimes/:adapter/status — single status snapshot
- GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition
- POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction
- GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process)

Routes use zValidator for path/query/body so the typed RPC client picks
up the schemas; mounted with the same requireTrustedAppOrigin
middleware as /claw/* /terminal /acl-rules /monitoring.
Generic React Query hooks backed by the typed RPC client (hc<AppType>),
keyed by adapter id. useRuntime polls /runtimes/:adapter/status every
5s by default; useRuntimeAction issues a capability-gated POST to
/runtimes/:adapter/actions/:action and invalidates the status query
on success; useRuntimeLogs is opt-in (disabled by default) for
container runtimes.
RuntimeStatusBar — compact one-line bar with adapter name + state pill
+ optional Restart action. Reads from useRuntime(adapter); the pill
covers every container and host-process state. extraPill / extraActions
slots let openclaw add its control-plane pill and Open Terminal
button without baking gateway specifics into the runtime layer.

RuntimeControlPanel — capability-gated state-appropriate primary CTA:
not_installed → Install, stopped → Start, errored → Restart + Reset,
installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI,
running → optional Stop. extras slot for adapter-specific affordances
(e.g. openclaw provider Setup dialog trigger).
…ge; drop legacy lifecycle UI

AgentsPage now uses the new runtime-control components for OpenClaw
lifecycle:
- RuntimeControlPanel replaces GatewayStateCards (state-appropriate
  CTAs gated on capabilities). Provider config dialog trigger lives
  in the panel's extras slot.
- RuntimeStatusBar replaces GatewayStatusBar (running pill +
  Restart). Control-plane pill + Open Terminal live in the bar's
  extra slots — gateway specifics stay outside the runtime layer.

GatewayStatusBar.tsx deletes outright. The 'Unavailable' badge in
AgentSummaryChips.tsx deletes — capabilities-driven UI surfaces the
same signal more usefully on the new RuntimeControlPanel; the prop
stays for upstream callers but is now a no-op.

ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from
OpenClawControls remain — they're alerts for control-plane and
mid-flight lifecycle states, distinct from the runtime control
surface. They cover gateway-specific concerns the runtime layer
doesn't model. Cleanup deferred to a follow-up.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

✅ Tests passed — 1212/1216

Suite Passed Failed Skipped
agent 76/76 0 0
build 9/9 0 0
eval 93/93 0 0
server-agent 261/261 0 0
server-api 183/183 0 0
server-browser 4/4 0 0
server-integration 9/10 0 1
server-lib 255/255 0 0
server-root 60/63 0 3
server-skills 31/31 0 0
server-tools 231/231 0 0

View workflow run

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR lands the user-visible runtime layer: a uniform /runtimes/<adapter>/* HTTP surface backed by AgentRuntimeRegistry, plus generic React Query hooks and two new UI components (RuntimeStatusBar, RuntimeControlPanel) that replace the openclaw-specific GatewayStatusBar and GatewayStateCards. The AgentsPage is rewired to consume the new components, and the per-row Unavailable badge is removed in favour of the capability-driven control panel.

  • Server (runtimes.ts): five new routes (list, status, SSE stream, action dispatch, logs) all validated with zValidator; action dispatch is capability-gated with correct 405/400/500 error handling; 11 new unit tests cover the key branches.
  • Client hooks (useRuntime.ts): typed RPC-backed useRuntime / useRuntimeAction / useRuntimeLogs with 5 s default poll and post-action query invalidation.
  • UI components: RuntimeControlPanel maps runtime state to capability-gated CTAs; RuntimeStatusBar renders a compact pill bar — both accept adapter-specific slots (extras, extraPill, extraActions) so openclaw-specific concerns stay out of the generic layer.

Confidence Score: 4/5

Safe to merge after addressing the minor cleanup items — the core server routes, hooks, and UI components are well-structured and covered by tests.

The new routes are capability-gated, validated, and tested. The UI components cleanly replace their predecessors without introducing regressions on the primary openclaw flow. The findings are quality/cleanup items: an unused query key, a dead prop retained for callers that is never read, a label-fidelity regression in the control-plane pill, and a subtle SSE heartbeat leak on silent TCP drops. None affect correctness of the main flow today.

useRuntime.ts (unused RUNTIME_QUERY_KEYS.list), AgentSummaryChips.tsx (dead adapterHealth prop), AgentsPage.tsx (ControlPlanePill label regression), runtimes.ts (SSE heartbeat cleanup)

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/api/routes/runtimes.ts New /runtimes/* HTTP surface: list, status, SSE stream, action dispatch, logs. Route logic is well-structured and capability-gated. Minor: SSE heartbeat write errors are silently swallowed and won't trigger early cleanup on a dead connection.
packages/browseros-agent/apps/agent/entrypoints/app/agents/useRuntime.ts New React Query hooks for runtime status, action dispatch, and logs. RUNTIME_QUERY_KEYS.list is exported but never consumed — dead code that should be removed per project rules.
packages/browseros-agent/apps/agent/entrypoints/app/agents/runtime-controls/RuntimeControlPanel.tsx New generic capability-gated control panel. State-to-CTA mapping is clear and exhaustive. extras slot correctly threads adapter-specific affordances without leaking openclaw specifics into the base component.
packages/browseros-agent/apps/agent/entrypoints/app/agents/runtime-controls/RuntimeStatusBar.tsx New compact status bar with extensible pill + action slots. State pill mapping is thorough. Separator rendering logic is correct.
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx Rewires AgentsPage to use new runtime components; removes GatewayStatusBar/GatewayStateCards. New ControlPlanePill merges 'reconnecting'/'recovering' into a single "Connecting" label, losing the granularity the old component provided.
packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx Removes the 'Unavailable' badge from per-row chips. adapterHealth prop is retained as optional but never used inside the component — dead interface surface.
packages/browseros-agent/apps/server/src/api/server.ts Mounts the new /runtimes router behind requireTrustedAppOrigin(). No issues.
packages/browseros-agent/apps/server/tests/api/routes/runtimes.test.ts 11 focused tests covering the new routes: capability gate, unknown action (400), agentId requirement, action throw (500), container vs host-process logs. Coverage is comprehensive for the happy path and key error branches.
packages/browseros-agent/apps/agent/entrypoints/app/agents/GatewayStatusBar.tsx Deleted entirely — replaced by the generic RuntimeStatusBar. Clean removal.

Sequence Diagram

sequenceDiagram
    participant UI as AgentsPage (React)
    participant Hook as useRuntime / useRuntimeAction
    participant RPC as Hono RPC Client
    participant Server as /runtimes/* routes
    participant Reg as AgentRuntimeRegistry
    participant RT as AgentRuntime (openclaw)

    UI->>Hook: useRuntime("openclaw") [5s poll]
    Hook->>RPC: GET /runtimes/:adapter/status
    RPC->>Server: GET /runtimes/openclaw/status
    Server->>Reg: registry.get("openclaw")
    Reg-->>Server: runtime instance
    Server->>RT: runtime.getStatusSnapshot()
    RT-->>Server: RuntimeStatusSnapshot
    Server-->>RPC: "{ descriptor, status, capabilities }"
    RPC-->>Hook: RuntimeView
    Hook-->>UI: "{ data, isLoading }"

    UI->>Hook: useRuntimeAction("openclaw")
    UI->>Hook: "action.mutate({ action: "restart" })"
    Hook->>RPC: POST /runtimes/openclaw/actions/restart
    RPC->>Server: POST /runtimes/:adapter/actions/:action
    Server->>RT: capabilities.includes("restart")?
    RT-->>Server: true
    Server->>RT: "runtime.executeAction({ type: "restart" })"
    RT-->>Server: void
    Server-->>RPC: "{ status: "ok", state: "starting" }"
    RPC-->>Hook: success
    Hook->>Hook: invalidateQueries(["runtime-status","openclaw"])

    UI->>RPC: GET /runtimes/openclaw/status/stream (SSE)
    RPC->>Server: SSE connect
    Server->>RT: runtime.subscribe(writeSnapshot)
    RT-->>Server: unsubscribe fn
    loop every state change
        RT->>Server: listener(snapshot)
        Server-->>UI: "event: snapshot data: {...}"
    end
    loop every 15s
        Server-->>UI: "event: heartbeat data: {ts:...}"
    end
    UI->>Server: abort
    Server->>RT: unsubscribe()
    Server->>Server: clearInterval(heartbeat)
Loading

Comments Outside Diff (2)

  1. packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx, line 131-138 (link)

    P2 pillForControlPlane loses distinct "Reconnecting" / "Recovering" labels

    The old GatewayStatusBar.tsx mapped 'reconnecting'"Reconnecting" and 'recovering'"Recovering" as separate cases with separate labels. The new implementation folds both under a single "Connecting" label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
    Line: 131-138
    
    Comment:
    **`pillForControlPlane` loses distinct "Reconnecting" / "Recovering" labels**
    
    The old `GatewayStatusBar.tsx` mapped `'reconnecting'``"Reconnecting"` and `'recovering'``"Recovering"` as separate cases with separate labels. The new implementation folds both under a single `"Connecting"` label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/browseros-agent/apps/server/src/api/routes/runtimes.ts, line 1067-1103 (link)

    P2 SSE stream: heartbeat write errors are silently swallowed after abort

    The heartbeat setInterval callback calls s.write(...).catch(() => {}), which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a closed flag and clearInterval on the first failed heartbeat write.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/browseros-agent/apps/server/src/api/routes/runtimes.ts
    Line: 1067-1103
    
    Comment:
    **SSE stream: heartbeat write errors are silently swallowed after abort**
    
    The heartbeat `setInterval` callback calls `s.write(...).catch(() => {})`, which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a `closed` flag and `clearInterval` on the first failed heartbeat write.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/useRuntime.ts:914-918
**Unused `list` key in `RUNTIME_QUERY_KEYS`**

`RUNTIME_QUERY_KEYS.list` is exported but never consumed — no `useRuntimeList` hook exists and the key isn't referenced anywhere in this PR. Leaving it creates false signal that a list-level invalidation pattern is in use. Per the project's cleanup guidelines, dead code should be removed rather than retained for hypothetical future use.

### Issue 2 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx:7-15
**`adapterHealth` prop declared but never used**

`adapterHealth` is kept as an optional prop "for upstream callers" but is never destructured, read, or acted on inside the component. Any caller that still passes it has the value silently discarded. The prop should be removed entirely — callers can be updated in the same pass since the change is purely additive (optional → removed). Keeping it as dead interface surface contradicts the project's remove-dead-code rule.

### Issue 3 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx:131-138
**`pillForControlPlane` loses distinct "Reconnecting" / "Recovering" labels**

The old `GatewayStatusBar.tsx` mapped `'reconnecting'``"Reconnecting"` and `'recovering'``"Recovering"` as separate cases with separate labels. The new implementation folds both under a single `"Connecting"` label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.

### Issue 4 of 4
packages/browseros-agent/apps/server/src/api/routes/runtimes.ts:1067-1103
**SSE stream: heartbeat write errors are silently swallowed after abort**

The heartbeat `setInterval` callback calls `s.write(...).catch(() => {})`, which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a `closed` flag and `clearInterval` on the first failed heartbeat write.

Reviews (1): Last reviewed commit: "refactor(ui): wire RuntimeStatusBar + Ru..." | Re-trigger Greptile

…nder Start CTA for installed state

Two stuck-state bugs in the new RuntimeControlPanel:

1. The runtime's state machine started fresh at not_installed on every
   server boot. tryAutoStart's short-circuit branches (gateway already
   running, auth pass) never drove the state transitions, so the UI
   saw not_installed for a gateway that was actually running. Add a
   syncState() method on OpenClawContainerRuntime that probes the
   actual container via cli.inspectContainer + /readyz and sets state
   accordingly. Wire it into tryAutoStart as the first step so it
   runs regardless of which branch the rest takes.

2. RuntimeControlPanel had no case for state === 'installed', so after
   a successful Install the panel went blank instead of offering the
   next step. Treat installed the same as stopped — show the Start
   CTA with copy that reflects the difference (image is pulled vs
   container exists but stopped).

Optional-chained the syncState call so existing tests with partial
runtime mocks don't crash on the missing method.
When a previous server boot wrote runtime-state.json after the gateway
container had already been created with a different hostPort (e.g. 18789
held at allocate-time → container started on 18790), the persisted port
disagrees with the live mapping. The runtime then probes the persisted
port forever and the UI sticks at `starting`.

`syncState` now reads `NetworkSettings.Ports` from inspect-container and
adopts the actual host port for the gateway container's published port
when it differs. The service then re-syncs `hostPort`/`httpClient` and
rewrites runtime-state.json so the next boot starts from a clean slate.

- ContainerInfo gains a flat `ports` array (parsed from
  `NetworkSettings.Ports`)
- OpenClawContainerRuntime.syncState: reconcile hostPort from live
  mapping before probing /readyz
- OpenClawService.tryAutoStart: adopt the runtime's reconciled port and
  persist it via writePersistedGatewayPort
…ismatch

When a previous boot leaves a gateway running with a stale token, the
realloc-on-auth-mismatch branch was bumping the persisted port without
actually freeing the old container — ManagedContainer.start() no-ops
when state==='running', so the next start cycle never recreated the
container on the new port. The result: persisted/service/runtime drift
back into mismatch, and history requests 500 with "gateway is not ready"
even while the (stale) gateway keeps serving chat from the old port.

Stop the gateway explicitly when we decide to bump off the port, so the
upcoming start cycle goes through the full remove + create + start path
on the freshly-allocated port. The token-mismatch test still passes;
adds a new test pinning the stop-before-realloc behaviour.
…fresh install

Starting the gateway via the new RuntimeControlPanel "Start" CTA goes
through runtime.executeAction({type:'start'}) directly, bypassing
OpenClawService.tryAutoStart and its ensureStateEnvFile() seeding step.
On a freshly-wiped .browseros-dev that left nerdctl create failing with
"failed to open env file .../.openclaw/.env: no such file or directory".

Seed the file (empty, mode 0600) inside buildContainerSpec so the
runtime is self-sufficient. Service callers continue to work — their
ensureStateEnvFile is now an idempotent no-op once the file exists.
OpenClawService.getStatus was carrying its own view of "is the gateway
alive" (running/stopped/uninitialized derived from machineStatus +
isReady probe) while the new AgentRuntime maintains the canonical state
machine. The two could disagree — most visibly after a wipe + partial
restart, where the runtime correctly read not_installed but the service
still reported running/connected from in-memory fields.

Map the legacy status surface from runtime.getStatusSnapshot().state so
both pills can't contradict each other. Clear controlPlaneStatus /
lastGatewayError / lastRecoveryReason whenever the runtime isn't
running — those signals are only meaningful for an alive gateway.

First chunk of the legacy-lifecycle removal. Lifecycle methods on the
service (restart/shutdown/tryAutoStart/etc.) and duplicated hostPort
state still exist and will be removed in follow-up commits.
Removes the start/stop/restart/reconnectControlPlane/shutdown surface on
OpenClawService — these duplicated the new AgentRuntime state machine
and were the root cause of the two views disagreeing. UI flows now go
through runtime.executeAction via the RuntimeControlPanel; server
shutdown via getOpenClawRuntime().executeAction({type:'stop'}).

Server:
- delete service.start/stop/restart/reconnectControlPlane/shutdown +
  stopGatewayLogTail (now unreferenced)
- delete /claw/start /claw/stop /claw/restart /claw/reconnect routes
- replace internal `await this.restart()` (createAgent, updateProviderKeys)
  with `runtime.restartGateway` — provider-config changes only need a
  container restart, not a control-plane re-probe
- main.ts shutdown handler uses getOpenClawRuntime().executeAction directly

UI:
- useOpenClawMutations drops startOpenClaw/stopOpenClaw/restartOpenClaw/
  reconnectOpenClaw and pendingGatewayAction; setup/create/delete remain
- AgentsPage drops the legacy LifecycleAlert + ControlPlaneAlert blocks;
  the RuntimeControlPanel already renders pending state on its own
  action buttons

Tests:
- delete tests for the removed methods
- runtime mocks in restart-side tests now expose restartGateway directly
Port persistence + reconciliation now lives entirely on the runtime
side. Service keeps a lazy httpClient getter that always reads the
current port from runtime.getHostPort(), so a port change (via
syncState drift detection) propagates everywhere automatically.

Server:
- OpenClawContainerRuntime seeds hostPort from runtime-state.json at
  construction (readPersistedGatewayPortSync) and writes back via
  syncState when the live container's mapping drifts
- OpenClawService.hostPort, setPort, adoptRuntimeHostPort,
  ensureGatewayPortAllocated, isCurrentGatewayAvailable,
  isGatewayAvailable, isGatewayAuthenticated, isGatewayPortReady,
  the httpClient field, and the local fetchOk all deleted
- tryAutoStart is now ~10 lines: syncState → executeAction({type:start})
  → control-plane probe; no port juggling, no auth-mismatch realloc
  (that path was driving the broken-state bug from earlier)
- internal `this.hostPort` references now go through runtime.getHostPort()

Tests:
- delete the four obsolete tryAutoStart tests (each asserted internals
  that are gone) plus the unused mockGatewayAuth helpers
- add two slim tryAutoStart tests pinning the new contract
- existing runtime tests still call setHostPort, so the method survives
  as a test-only override
The runtime state machine is now the single source of truth in the UI;
the old OpenClawStatus surface (controlPlaneStatus, lastGatewayError,
lastRecoveryReason, the status enum) and its consumers are all dead
weight after Chunks 1-4. Drop them.

UI:
- OpenClawControls.tsx: delete StatusBadge, ControlPlaneBadge,
  AgentsPageHeader, LifecycleAlert, ControlPlaneAlert, GatewayStateCards.
  Keep ProviderSelector + InlineErrorAlert — still used by the setup
  dialog and AgentsPage's inline error surface.
- agents-page-utils.ts: delete getControlPlaneCopy, getRecoveryDetail,
  getGatewayUiState, getLifecycleBanner, canManageOpenClawAgents,
  shouldShowControlPlaneDegraded, getControlPlaneCopyForStatus.
- agents-page-types.ts: delete GatewayUiState, LIFECYCLE_BANNER_COPY,
  CONTROL_PLANE_COPY, FALLBACK_CONTROL_PLANE_COPY, RECOVERY_REASON_COPY.
- useOpenClaw.ts: delete OpenClawStatus + GatewayLifecycleAction.
The agents page only surfaced OpenClaw's lifecycle controls — Hermes
auto-installed silently at boot with no UI visibility or manual handle.
Adds a generic section that iterates over container-kind runtimes from
/runtimes and renders a control panel + status bar per adapter.

- new useRuntimes() hook hits GET /runtimes
- new RuntimesSection renders one card per container runtime, with an
  adapter-keyed extras registry for adapter-specific affordances
  (panel extras + status-bar pill / actions)
- AgentsPage replaces its hand-rolled openclaw panel + bar with the
  section, plugging Configure-provider + Terminal into the openclaw
  slot via the registry
- the section becomes adapter-agnostic: new container runtimes show up
  on the page automatically (filtered by descriptor.kind === 'container')
ManagedContainer.start was firing the subclass `readinessProbe()`
exactly once, the moment containerd reported the container as Up.
For OpenClaw this raced the Node.js gateway's HTTP listener bind —
containerd flips status as soon as the entrypoint process spawns, but
the Express server takes a few hundred ms to start serving /readyz.
Single-shot probe → unlucky → state='errored' with
"Readiness probe failed after container reached running state".

Pre-refactor (dev branch) didn't hit this because openclaw used a
two-phase flow: `runtime.startGateway` (no probe) then
`service.waitForReady` (polled /readyz for 30s). When the new
runtime architecture folded openclaw under ManagedContainer, the
polling was lost.

Bring it into the base class: `ManagedContainer.start` now polls
`readinessProbe()` within `descriptor.readinessProbe.timeoutMs` at
`intervalMs` cadence. Deterministic probes (Hermes' `--version` exec)
succeed on the first call and exit immediately — no extra latency.
HTTP probes get the full budget they need.

Also stops misapplying `descriptor.readinessProbe` to the containerd
"Up" wait (which only takes ~50ms anyway — defaults are fine).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant