Skip to content

feat(browser): BrowserSession, generic browser tools, and deferred activation (#1186 PR2)#1231

Merged
Astro-Han merged 64 commits into
devfrom
claude/browser-session-tools
Jun 11, 2026
Merged

feat(browser): BrowserSession, generic browser tools, and deferred activation (#1186 PR2)#1231
Astro-Han merged 64 commits into
devfrom
claude/browser-session-tools

Conversation

@Astro-Han

@Astro-Han Astro-Han commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Second of three flat PRs for embedded-browser agent control (#1186), built on the sealed CDP bridge from PR #1221. This is the agent-facing layer: per-conversation embedded browser views, a BrowserSession that drives a conversation's WebContentsView over the bridge, seven generic browser_* tools, lazy model-driven activation of a deferred browser tool group, and the session-timeline cards that render each step.

  • Per-conversation views (main/browser/controller.ts + controller-automation.ts, desktop) — one WebContentsView per conversation (root session), lazily created, plus a per-window draft on the new-session screen that is adopted into the session it creates. Windows are pure displays: display(win, rect) reparents a conversation's view into whichever window shows it; a view driven in the background keeps running unattached (default 1280×720 bounds so screenshots never capture 0×0). Page, history, and login state (one shared persistent partition) live and die with the conversation — another conversation's page showing up is unrepresentable.
  • browser/browser-bridge.ts (opencode) — BrowserBridge host seam: the embedded server asks the main process to resolve a session to a CDP endpoint and to release it on teardown. Resolution is the identity mapping (session → its own view); structured BrowserBridgeError codes so the server can reason about failures without sharing the main-process class.
  • browser/session.ts (opencode) — withBrowserPage connects via opencli CDPBridge (stealth injected on connect), caches one connection per conversation with per-root single-flight, reloads once to apply stealth when taking over an already-open http(s) document, enforces a 25s tool-level timeout, invalidates and reconnects on connection loss, and releases on session delete/archive.
  • tool/browser-*.tsbrowser_navigate/snapshot/click/type/wait/screenshot/extract, each behind the browser permission key (baseline allow). Shared runBrowserAction helper; screenshots return data:image/png attachments rendered inline on the tool card; extract paginates with a char cursor.
  • tool/tool-info.ts + registry.ts — the seven tools form a deferred browser group, desktop-gated, activated by group or member name, collapsed to one activation card. Activation derives durably from message parts so it survives compaction and restart.
  • ui/.../tools/browser.tsx + i18n + trow grouping — seven cards (navigate gets a clickable link like webfetch; snapshot/extract expand to the text the agent read; screenshot expands to the captured image with click-to-zoom), localized titles (en/zh/zht), and a browser trow activity kind so collapsed turns summarize browser steps as one line.
  • prompt routing (pawwork.txt + shell.txt) — web-task intents route to the browser group from the resident surfaces; webfetch positioned as one-shot read-only; curl/wget browser sessions forbidden; contract test pinning all three surfaces.

Why

PawWork's agent needs to drive the embedded browser with no Chrome, no extension, and no second process. PR #1221 landed the sealed transport; this PR turns it into tools the model can actually call. The tools stay generic (navigate/observe/act/extract) and lazy — they activate only when the model asks for the browser group, so the default tool surface is unchanged for non-browser work. Stealth is treated as a first-class capability (injected on every connect), not an opt-out.

Related Issue

#1186

Post-review redesign: conversations own their browser

Live-desktop testing after round 11 surfaced three symptoms with one root cause: the browser was owned by the window (PR #1221 model) while every consumer — the agent, the permission model, the timeline, the user — thinks in conversations. The panel leaked across sessions during agent drives, a page opened on the new-session screen could not be handed to the session it created, and two conversations in one window shared one page. The window-lease machinery (probe pins a windowID, attach validates it, mismatch self-heals, three-tier window arbitration) existed only to bridge that mismatch.

The redesign (its own design doc, reviewed by a second model before build) re-keys views by conversation and retires the lease wholesale:

  • Registry key: windowIDrootSessionID | draft:<windowID>. Renderer browser:* IPC carries an explicit target, validated in main against the calling window's DesktopContext — a stale or miswired panel no-ops instead of steering another conversation's view.
  • Route changes hide left-behind views in main (the panel survives session switches without remounting, so it cannot); same-conversation-in-two-windows resolves by display ownership with an explicit reclaim affordance, so resize ticks can never steal the view back.
  • resolveEndpoint loses windowID; probeWindow becomes probeSession({ url }); no-window/window-ambiguous error codes, the lease-mismatch fail-fast, byEndpoint connection sharing, and the resolver arbitration are deleted (~570 lines net). The TOCTOU class the lease guarded — permission judged against one window, action landing in another — is now unrepresentable: a session's action can only ever land in that session's own view.
  • Kept unchanged and still pinned by tests: probe-before-ask with no side effects, origin-scoped "always allow", configured-deny-over-approvals, redirect landing re-judge, abort/timeout severing the connection, root-session mapping for subagents, group activation, prompt routing.

Intentional Deviation From Earlier Review (Codex P1-2)

The plan's first review suggested threading a visible_session_id header through promptAsync to bind a tool call to the window the user sees. The per-conversation ownership model supersedes both that suggestion and the main-process window resolution that replaced it: there is no window to resolve anymore — the session IS the target. The ALS/request-context header route stays dropped.

Review Hardening

Eleven external review rounds tightened the permission model after the initial submission; every disposition is a PR comment. The window-lease invariants rounds 6–9 converged on are retired by the ownership redesign above (the ambiguity they guarded no longer exists). What survives them: every action reads its own conversation's page URL at probe time (side-effect free, before the ask) and the permission is judged against it; "always allow" grants are origin-scoped; a configured deny is never overridden by an approval (shared permission layer, covers every permission); a user stop (abort) or tool timeout severs the CDP connection, so a canceled action can never keep driving the page; and a redirect's real landing is re-judged after load — a deny on the destination fails the action loudly. That last one is a deliberately soft contract (the document has committed by re-judge time; vetoing the load itself needs request-phase CDP interception and is a documented follow-up), pinned as such in the cross-site redirect test. The browser-defaults-to-allow ruling (design doc §9, reconfirmed mid-review) is pinned by pawwork-defaults.test.ts.

Verification

Automated, all green (as of the round-13 hardening push):

  • opencode: full suite 3852 pass, 0 fail; desktop-electron: 523 pass; app: 1838 pass; ui: 728 pass; repo-wide typecheck clean.
  • bun run snap browser-tools — the browser tool cards grid (cards, navigate link, snapshot/extract expansion, running shimmer, collapsed trow) visually verified after the subtitle unification.

Live desktop regression (interactive GUI, scheduled next): six-prompt script — observe → click → type → extract on a real page, stealth takeover of an already-open document, screenshot inline rendering, conversation switching mid-drive, Home-draft adoption into a new session, and mid-action stop.

Review Focus

  • Ownership boundaries in controller.ts/controller-automation.ts: display/hide reparenting, draft adoption atomicity, the route-change sweep in syncWindowDisplay, and target validation in ipc/browser.ts.
  • session.ts connection lifecycle: per-conversation 1:1 mapping, the one-shot takeover reload, abort/timeout severing, connection-loss invalidation.
  • The deferred-group activation closure in tool-info.ts (group/member naming, durable derivation across compaction, repair hint, reminder).

Human Review Status

Pending

Summary by CodeRabbit

  • New Features
    • Desktop browser automation tools: navigate, snapshot, click, type, wait, screenshot, extract.
    • Multi-window browser display management with claim/reclaim and session-targeted tabs.
    • Browser permission controls to gate automation per-origin.
    • UI enhancements: dedicated browser tool cards, inline screenshots, and close/confirmation flows for browser tabs.
  • Documentation
    • Added usage docs and PawWork guidance for browser tasks.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 26 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 24fb7d8c-f540-4a8c-abe0-8a0f2eedbff3

📥 Commits

Reviewing files that changed from the base of the PR and between 7ccb036 and 9113e7d.

📒 Files selected for processing (6)
  • packages/app/e2e/snap/fixtures/browser-tools-snap-fixture.tsx
  • packages/app/src/pages/session/browser/close-page.test.ts
  • packages/app/src/pages/session/browser/close-page.tsx
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/tool/tool-info.ts
  • packages/opencode/test/tool/tool-info.test.ts
📝 Walkthrough

Walkthrough

Adds target-scoped browser control, browser tools, session adoption, UI renderers, permission/activation updates, and broad test/CI coverage.

Changes

Embedded browser tooling and control flow

Layer / File(s) Summary
Bridge contracts and package wiring
packages/opencode/src/browser/browser-bridge.ts, packages/opencode/src/node.ts, packages/opencode/package.json, packages/desktop-electron/src/main/env.d.ts
Introduces BrowserBridge host API/typed errors, re-exports, and dependency updates for ws/opencli.
Desktop controller and registry
packages/desktop-electron/src/main/browser/registry.ts, packages/desktop-electron/src/main/browser/controller.ts, packages/desktop-electron/src/main/browser/logic.ts
Adds BrowserControllerRegistry, target-owned BrowserViewController lifecycle, and displayDecision rules for takeover/drop/show.
Desktop host and IPC wiring
packages/desktop-electron/src/main/browser/automation-host.ts, packages/desktop-electron/src/main/ipc/browser.ts, packages/desktop-electron/src/preload/index.ts, packages/desktop-electron/src/main/index.ts
Provides createDesktopBrowserBridgeHost, wires provideHost in main, validates target-resolving IPC, and updates preload to target-scoped calls.
App panel targeting and adoption
packages/app/src/pages/session/browser/browser-panel.tsx, packages/app/src/components/prompt-input/submit.ts, packages/app/src/pages/session/helpers.ts, packages/app/src/pages/session/session-side-panel.tsx
BrowserPanel becomes target-scoped, displaced overlay/reclaim added, submit flow adopts drafts into sessions, and automation-attached events open driven browser tabs.
Browser session runtime and permissions
packages/opencode/src/browser/session.ts, packages/opencode/src/config/permission.ts, packages/opencode/src/permission/index.ts, packages/opencode/src/session/session.ts
Implements session acquisition, single-flight connects, probe/release/dispose, timeouts/abort semantics, and adds browser permission key plus denial semantics.
Browser tools and deferred activation
packages/opencode/src/tool/browser-*.ts, packages/opencode/src/tool/browser-shared.ts, packages/opencode/src/tool/registry.ts, packages/opencode/src/tool/tool-info.ts
Adds runBrowserAction, concrete browser tools (navigate, snapshot, click, type, wait, screenshot, extract), registers them for desktop clients, and implements group-aware deferred activation for browser group.
UI renderers and i18n
packages/ui/src/components/message-part/tools/browser.tsx, packages/ui/src/components/tool-info.ts, packages/ui/src/i18n/*, packages/ui/src/components/message-part.css, packages/app/e2e/snap/*
Registers browser tool renderers, attachments/subtitle helpers, adds i18n keys, screenshot styling, and e2e snapshot fixture/tests.
Tests and CI
packages/opencode/test/*, packages/desktop-electron/src/main/browser/*.test.ts, packages/ui/test/*, .github/workflows/windows-advisory.yml
Extensive tests added for sessions, fake CDP server, tools, controller registry, tool-info group behavior, UI subtitle tests; CI Windows shard now includes test/browser paths.

Sequence Diagram(s)

sequenceDiagram
  participant DesktopMain as Desktop main process
  participant BrowserBridge as BrowserBridge host
  participant BrowserControllerRegistry as ControllerRegistry
  participant BrowserViewController as ViewController
  participant Preload as Preload API
  participant BrowserPanel as BrowserPanel

  DesktopMain->>BrowserBridge: provideHost(createDesktopBrowserBridgeHost())
  BrowserPanel->>Preload: navigate(target, url)
  Preload->>DesktopMain: IPC browser:navigate(target,url)
  DesktopMain->>BrowserControllerRegistry: resolve/ensure(target)
  BrowserControllerRegistry->>BrowserViewController: display / navigate / hideFor
  BrowserViewController-->>DesktopMain: emit {target, state}
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

Poem

A rabbit hops through code so bright,
Tabs and tools aligned at night.
Drafts adopt and targets sing,
Tests and bridges, everything.
🐇 Click—navigate—then share a bite.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/browser-session-tools

@Astro-Han Astro-Han added enhancement New feature or request desktop P2 Medium priority labels Jun 10, 2026
@github-actions github-actions Bot added app Application behavior and product flows ui Design system and user interface platform Electron shell, OS integration, packaging, updater, signing, paths, and permissions harness Model harness, prompts, tool descriptions, and session mechanics labels Jun 10, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes user-path files (packages/desktop-electron/src/main/browser/automation-host.ts, packages/desktop-electron/src/main/browser/automation-resolver.test.ts, packages/desktop-electron/src/main/browser/automation-resolver.ts, packages/desktop-electron/src/main/env.d.ts, packages/desktop-electron/src/main/index.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces embedded-browser automation capabilities to the desktop application, adding seven new browser tools along with their corresponding CDP connection management, window resolution logic, UI components, and tests. Feedback on these changes highlights a potential concurrency race condition in acquire that could cause duplicate connection attempts, a resource leak in session.ts where the main process is not notified of invalidated connections, and several instances of unnecessary optional chaining on props.metadata in the UI components.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/opencode/src/browser/session.ts
Comment thread packages/opencode/src/browser/session.ts
Comment thread packages/ui/src/components/message-part/tools/browser.tsx Outdated
Comment thread packages/ui/src/components/message-part/tools/browser.tsx Outdated
Comment thread packages/ui/src/components/message-part/tools/browser.tsx Outdated
@github-actions github-actions Bot added the ci Continuous integration / GitHub Actions label Jun 10, 2026
@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round — disposition

Four findings from an external review of this PR, verified against the code before acting:

P1 — tools connected (and stealth-reloaded an already-open page) before the permission ask. Confirmed; introduced by the URL-scoped permission fix (ea9ed2e), whose URL probe went through the CDP connection. Fixed in 7f6c198: the probe now reads main-process view state via a side-effect-free Host.currentUrl (window pick + webContents.getURL(), never attaches or creates anything). Regression test: with a page open and the permission denied, every browser tool fails with zero CDP commands reaching the bridge.

P1 — click / type(submit) should require confirmation by default. Not adopted — deliberate design ruling, re-confirmed by the maintainer today: every browser action defaults to allow because the embedded browser is local and fully visible (the user watches the agent act and can take over at any moment); permission.browser rules tighten per URL where needed, and the URL gate now genuinely runs before any page contact (see above). The ruling is pinned by a test in test/permission/pawwork-defaults.test.ts and recorded in the design doc.

P2 — concurrent first acquires race onto the single-client bridge. Confirmed; fixed in 8eb4857 (single-flight per root session and per endpoint, regression tests for same-session and cross-session races).

P2 — invalidate() left the main process with a stale attachment. Confirmed; fixed in b738cf7 (best-effort host release at invalidation, exactly-once semantics, regression test).

Full opencode suite (3748 pass) + desktop-electron suite (513 pass) + typecheck green.

@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round 3 — disposition

Both findings verified and fixed:

P2 — host attachment orphaned around a pending acquire. Confirmed, two gaps: a connect failing after resolveEndpoint succeeded left the main-process attachment with no server-side mapping for a later release to find, and a session delete/archive landing while the first acquire was still in flight early-returned and orphaned the connection the acquire then finished building. Fixed in 0e01377: the acquire rolls its attachment back when the connect fails (safe even with other sessions on the window — a failed connect means no live connection to lose), and releaseBrowserSession waits out a pending acquire before cleaning up. Regression tests cover both paths.

P3 — browser_wait accepted blank text/selector. Confirmed: the exactly-one check counts by !== undefined while everything downstream tests truthiness, so text: "" degraded into a bare-timeout wait labeled "30s pause". Fixed in 49b7002: blank/whitespace text/selector fail fast before the permission ask. Regression test asserts no ask and zero CDP commands.

Full opencode suite (3751 pass) + typecheck green.

@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round 4 — disposition

P1 — permission probed against one window, action possibly run in another. Confirmed: the probe and the attach each ran an independent window pick (showing-session > single window > focused), so with background sessions or multiple windows a focus change in between could grant against window A's URL and act on window B. There was also a subtler pre-existing skew: the connection is cached per session while the probe re-picked every call, so the probe could read a window the cached connection never pointed at. Fixed in 8b609b5 with the suggested lease approach: probeWindow returns {windowID, url} (preferring the window the session is already attached to), and resolveEndpoint attaches exactly that window — a closed window fails as no-window rather than being silently re-picked, since the grant doesn't transfer. Regression tests cover the review's scenario (two windows, focus moves after the probe → action lands in the granted window, the other window sees zero CDP traffic) plus resolver-level lease semantics.

Remaining skew within a single window (the page navigating itself during a long-lived ask dialog) is accepted: the embedded browser is visible next to the dialog, and the default-allow flow has no dialog wait at all.

Pending verification — Windows advisory failure. It failed in the setup-bun install step (infra flake), not in tests; rerun of the failed job is green (unit-windows-opencode-session ✓). The workflow is advisory (non-required) by design.

Full opencode suite (3752 pass) + desktop-electron suite (516 pass) + both typechecks green.

@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round 5 — disposition

Both findings confirmed and fixed in 6c3ca9b; they were gaps in the round-4 lease, not new surfaces.

P1 — a failed window pick degraded to an un-leased "*" ask. probeWindow collapsed "window exists but shows no http(s) page" (legitimate * case, lease present) and "no window can serve the session" (no lease possible) into one weak signal. It now throws the same typed error resolveEndpoint would (no-window / window-ambiguous), and the action fails before the ask. Regression test: a failing probe produces no ask and zero CDP traffic.

P2 — navigate ran un-leased. Explicit patterns (the destination) skipped the whole probe, so its attach re-picked at execution time. runBrowserAction now probes for every action: explicit patterns only override what the permission is judged against, never whether the attach is pinned. Regression test: two windows, focus moves during the ask — Page.navigate lands only in the leased window.

The lease invariant is now total and one sentence: probe (pin window, read URL) → ask → attach the leased window — or fail before the ask.

Full opencode suite (3764 pass) + desktop-electron suite (516 pass) + both typechecks green.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 6 disposition

P2 — concurrent first actions with different window leases share one pending acquire: fixed in b3f5c9a.

Verified as described: pendingAcquires keys on the root session only, so a second concurrent first action joined the first one's acquire regardless of its own lease — running in a window its permission (judged against its own probe's URL) never applied to. The bySession cache had the same gap: when the leased window closes before the connection loss is noticed, a re-probe leases the surviving window but the cached connection still points at the dead one. Both reuse paths are fixed together.

Fix: a connection now remembers the windowID lease it was acquired under (pendingAcquires entries carry it too), and both reuse paths fail fast on a mismatch with a retryable error ("The browser window for this session changed between the permission check and the connection. Retry the action."). Retrying re-runs probe → ask, and the attached-first probe converges on the surviving window.

On the suggested alternatives: unifying onto the first leased window is unsound — the second action's ask was already judged against its own window's URL, so moving the action transfers a grant that was never given. Serial retry inside the session layer is also out: a retry must re-probe and re-ask, and that sequence belongs to the tool layer (retrying below the ask would bypass it). Fail-fast is the only shape that preserves the lease invariant.

Tests (the requested concurrent shape plus unit pins):

  • browser-tools.test.ts: two snapshot actions race; probes lease window 1 then window 2; the second fails with the retryable error, asks show each action judged against its own window's URL, and window 2's CDP server sees zero traffic.
  • session.test.ts: a different-lease caller arriving during a pending acquire rejects (one CDP connection total); a cached connection bound to another window rejects a mismatched lease but still serves a matching one.

opencode: 3767 pass, typecheck clean.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Round 6 follow-up: the mismatch fix itself had a convergence regression — fixed in 9929f80

An internal re-review of b3f5c9a found that its cached-path check rejected the reuse but left the stale connection in bySession. The retry convergence claimed in the round-6 disposition did not hold there: when the leased window closes while the session is idle, the dead socket is never noticed (the CDP client has no close callback; loss only surfaces on the next send), and the mismatch check runs before any command could hit the connection and trigger the connection-loss invalidation. Every retry mismatched the same zombie forever — a scenario that self-healed before b3f5c9a (the action hit the dead connection, got a connection-loss error, and the next call reconnected) became a permanent dead end.

Fix: a mismatching cached connection is invalidated along with the failure. Sound because a live cached connection can only mismatch when its window stopped serving this session (the attached-first probe would otherwise have returned it). The action still fails rather than reconnecting in place — invalidate's host release is fire-and-forget and an immediate re-attach would race it; the retry arrives after a full probe → ask round trip. The pending-path check is unchanged (that in-flight acquire belongs to a healthy concurrent action).

Tests: the session-level mismatch test now pins the full recovery (matching lease reuses; mismatch drops the connection and notifies the host; the retry connects to the surviving window), plus a tool-level end-to-end: action on window 1 → window closes idle → next action fails retryably → retry lands on window 2.

opencode: 3768 pass, typecheck clean.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 7 disposition

P2-1 — global "always allow" voids per-URL denies: fixed in 06c35fc.

Verified as described, and worth being precise about what it is NOT: this does not challenge the default-allow ruling (design doc §9), which governs the unconfigured baseline. P2-1 is about a user who HAS configured tightening rules — approvals append after configured rules and evaluation is last-match-wins, so one "always allow" click on a harmless site wrote browser:* allow and silently voided the user's own deny. That breaks the other half of the ruling: rules must reliably tighten per URL. Fix: the always grant is now scoped to the asked URL's origin (https://site/*); a blank/non-web target offers no always at all rather than a global one. The requested verification is pinned in pawwork-defaults.test.ts (origin-scoped approved allow never overrides another site's configured deny, and untouched sites still ask), plus per-scenario always assertions across the tool tests.

P2-2 — extract reads and converts the full page before slicing: fixed in 34a22c2.

The page-side script now caps the outerHTML at 2M chars before it crosses CDP and flags the cut; the tool notes the drop and suggests narrowing with selector. This bounds both the CDP transfer and the synchronous htmlToMarkdown conversion (which no timeout could interrupt — the cap is the protection). Chose the input-cap over page-side chunked extraction: the markdown paging already handles long content, and chunking HTML risks splitting tags mid-element for marginal gain. Contract test covers the truncated flag surfacing; the in-page slice itself can't execute against the fake CDP server, so it falls under the planned real-app verification pass.

P3 — browser_wait drops the takeover-reload note: fixed in cb54d72.

run now passes the takeover info out and the output appends takeoverNote, matching every other tool. Test pins first-wait-shows-note / second-wait-doesn't.

opencode: 3771 pass, typecheck clean.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 8 disposition

P2 — origin-wide always still overrides same-origin path denies: fixed in 6da0164.

Verified as described: the origin scope (round 7) only stops cross-site override; within the origin, an approved https://example.com/* still out-matched a configured https://example.com/admin/* deny (approvals append after configured rules, last match wins).

Fix taken at the layer where it belongs: Permission.ask now evaluates the configured ruleset first and short-circuits on deny — approvals may relax asks, never denies. Narrowing the always grant to the exact URL was considered and rejected: it would gut "always allow" (every path on the same site re-asks) while the only thing that actually needs hard protection is a deny the user wrote down. The deny-precedence rule also closes the same gap for every other permission (a bash deny under a broad approved prefix had it too), and nothing can depend on the old behavior — a deny request never reaches the dialog, so an approval covering it can only ever have been collateral from a click on a different pattern.

One deliberate non-change: a same-origin configured ask (not deny) is still relaxed by the origin grant. That is the point of "always allow" — the dialog shows the origin pattern being granted — and asks are a friction preference, not a boundary.

The requested verification is pinned as a full ask→reply(always)→re-ask flow test in next.test.ts: admin path still denies after the always click on /home, and the rest of the site stops asking.

opencode: 3772 pass, typecheck clean.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 9 disposition

P2 — a redirect can land on a site the user denied: fixed in bb12280.

Verified, and the gap was deeper than described: opencli's goto caches the REQUESTED url and getCurrentUrl() echoes that cache, so the reported "landed" URL never differed from the request — the redirect was invisible to the user-facing output as well as to any re-judge. browser_navigate now reads the document's real location.href and, when it differs from the requested URL, asks the browser permission again for the landing (origin-scoped always, redirectedFrom in the metadata). Same-string landings — the overwhelmingly common case — skip the second ask entirely.

On the stricter alternative (intercepting at the request/redirect stage): not taken for now. It needs CDP Fetch interception with the user's URL rules threaded into the browser layer — a different blast radius — while the residual exposure of the post-load check is one page render: the action itself fails loudly, and every subsequent action probes the denied page and is blocked by the per-action ask. If a real case shows page-load alone is the harm, that interception is the documented follow-up.

Tests: the requested ok→blocked redirect-deny shape (navigate fails, both asks logged with their patterns), plus a same-site redirect pinning that the real landing is both reported and re-judged.

P3 — group activation announces members the registry won't expose: fixed in 07a38bd.

Verified: activation passed on "at least one member available" and rendered the full roster; the registry filters per member. Chose per-member filtering over all-or-none — disabling one member (say screenshots) shouldn't vaporize the whole group, and the registry already behaves per-member. The activation now renders only available members and fails when none are; the rendered list is recorded in the part's metadata, and the one-shot activation reminder names that recorded set (consistency is structural: the same deferredAvailable predicate flows to tool_info's execute and the registry's filter). The durable activation set still expands to the full group — availability is dynamic and the registry's per-member filter gates each step's exposure.

Tests: partial availability (screenshot disabled → rendered blocks, callable line, and metadata.members all exclude it; none available → activation fails) plus reminder/derivation pins.

Pending manual verification — acknowledged; it is tracked in the PR body's Verification section (real-GUI observe → click → type → extract plus the stealth-takeover path) and stays the gate before merge.

opencode: 3832 pass, typecheck clean.

Comment thread packages/opencode/test/tool/browser-tools.test.ts Fixed
@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 10 disposition

P2 (redirect deny is soft — page loads before the re-judge): accepted as the soft-contract option, now pinned. Round 9 already chose the soft contract (request-phase CDP Fetch interception is a different change surface and remains the documented hard-boundary follow-up), but only the code comment said so — the test asserted the rejection and the ask sequence without the other half. The cross-site redirect test now also asserts Page.navigate was sent, with a comment naming the contract explicitly: the deny guarantees the action fails loudly and later actions re-probe the denied page; it does not prevent the document from committing. The PR body's Review Hardening section now states the same. 0bd0f55aeb.

P3 (partial-group residue) — split disposition:

  • Repair hint: real, fixed. buildDeferredHint's group branch only required some member to be available, so a direct call to a disabled member (e.g. browser_screenshot filtered out) produced a hint promising that member would be callable after activation — a tool the registry filters out every step. A member-specific call now hints only when that member itself survives the availability filter (otherwise it falls back to the plain invalid-tool error, same as the standalone branch), and a group-name call picks an available exemplar instead of the roster head. Tests cover both. 64b516003d.
  • Durable activation snapshotting metadata.members: pushing back. The expansion-then-filter split is load-bearing, not residue. Exposure is gated per step at registry.ts (isDeferredAvailable(tool.id) && activatedTools.has(tool.id)) — a disabled member is never registered into the model's tool list regardless of what the durable set contains, so the harm the suggestion guards against can't occur. Snapshotting the activation-time member list would introduce the reverse bug: availability is time-varying (permissions, settings), so a member that happened to be disabled at activation time would stay invisible forever after being re-enabled, with no signal to the model that re-activation is needed. Activation records monotonic intent; availability is a per-step gate. The partial case is now pinned by a new registry test (activated group + browser_screenshot filtered → other members exposed, screenshot never) so the guarantee is a tested contract rather than a claim. f5e194890f.

Full opencode suite: 3835 run, 0 fail; typecheck clean.

@Astro-Han

Copy link
Copy Markdown
Owner Author

Review round 11 disposition

P1 (stop doesn't stop the page): real, fixed. Verified both halves: ctx.abort (an AbortSignal on every tool context) was never read anywhere in the browser execution path, and withBrowserPage's timeout only raced the wait — in both cases the still-running action kept issuing CDP commands after the tool reported failure. Since CDP has no command-level cancel, the fix severs the connection: withBrowserPage now takes the abort signal, and both abort and timeout invalidate() the connection before rejecting (typed BrowserActionCanceledError for abort). That closes the socket, so the orphaned run()'s in-flight and subsequent commands fail locally instead of reaching the page, the host attachment is released exactly-once, and the next action re-probes and reconnects through the normal self-heal path (with the usual one-shot stealth reload on an already-loaded document — the takeover note tells the model). An already-aborted signal fails before connecting at all. runBrowserAction passes ctx.abort through, so all seven tools get this without per-tool wiring.

Tests, per the suggested scenarios: session-level — abort on a hung command fails fast with the typed error, host release fires, and the next action provably opens a fresh connection (second stealth registration); timeout now asserts the same severing; pre-aborted signal never touches the server. Tool-level — a hung browser_click through the real execute path with an AbortController rejects on /canceled/ and releases the host. One tool-level test covers the matrix because wait/click/type(submit) all share the single runBrowserActionwithBrowserPage path the fix lands in. 69712bf2c0.

Pending manual verification: agreed, unchanged. The live dev:desktop observe → click → type → extract round-trip plus the stealth takeover remains the explicit pre-merge gate (listed in the PR body's Verification section); it is on the human reviewer's side and this round's fix is in scope for that pass (a mid-action stop is now worth exercising there too).

Full opencode suite: 3838 run, 0 fail; typecheck clean.

…utomation

Server side gains a BrowserBridge injection port (same-process host handed
in by desktop main right after the embedded server starts, so the CDP
endpoint/secret never crosses renderer IPC). Main side resolves which
window an agent session drives via the renderer-reported per-window
DesktopContext: session match > single window > focused window, ambiguous
multi-window is a typed error. Host errors cross the boundary structurally
via an error code property because main cannot share classes with the
server bundle.
…eover and teardown

One shared CDP connection per window endpoint, keyed by root session id so
subagent calls land on the conversation the user sees. opencli's connect()
already registers the stealth script for future documents; an already-open
page is reloaded once so it gets the script too (the stealth source is not
a public export, reload is the contract-clean path). Tool calls get a 25s
timeout that beats opencli's internal 30s guard, and connection loss
invalidates the cache so the next call reconnects. Session delete/archive
releases the connection and detaches the main-process bridge via the
clearPendingInteractions hook. Pins @jackwener/opencli 1.8.3; a contract
test locks the IPage surface the tools call so a version bump fails in CI
instead of in a user's session.
navigate/snapshot/click/type/wait/screenshot/extract over the BrowserSession
IPage, following the observe→act contract: snapshot returns [N] refs, click
and type take a ref and report self-verification (matches_n, match_level,
verified/actual). navigate validates http/https BEFORE goto since
CDPPage.goto bypasses the view's will-navigate guard; extract passes the
selector as structured JSON (never string-concatenated into page JS) and
pages long markdown via a next_start_char cursor; screenshot returns a PNG
attachment with annotated-capture fallback. The browser permission key is
declared in the config schema (Rule, URL-scoped patterns for navigate);
the effective default is allow via the agent baseline "*": "allow".
DEFERRED entries gain group and clients fields; the seven browser_* tools
form the 'browser' group, shown as ONE card in tool_info's listing and
activated as one set: tool_info(name="browser") — or any member name —
returns every member's description and schema and records the group as
activated. Derivation expands the group to member ids from durable history
(same storage-fed path, so it survives compaction and restart); the
one-shot activation reminder lists the members and warns there is no tool
named 'browser'; the repair hint for a direct browser_* call routes to the
group. Builtins register desktop-gated; cards filter by client so non-
desktop hosts never advertise tools they cannot run; Permission.disabled
maps browser_* to the browser key so permission.browser deny hides cards
and hints, not just the eventual ask.
Render the seven browser_* tools in the session timeline: navigate gets a
clickable link row like webfetch; snapshot and extract expand to the text
the agent actually read; click/type/wait/screenshot show their literal
target as subtitle. Tool icon, localized titles (en/zh/zht), and a
"browser" trow activity kind so collapsed turns summarize browser steps
as one line. Covered by the browser-tools snap target rendering the cards
inside a real TrowBlock.
opencli's evaluateWithArgs injects each argument as a top-level const, so the
injected variable is `selector`, not `args.selector` — the extract script
referenced `args.selector` and threw ReferenceError on the real CDP backend,
making page extraction unusable (the fake server returned a canned value, so
tests missed it). Drop the evaluateWithArgs branch entirely: the selector is
already JSON-serialized into the evaluate() script (injection-safe), so the
single path is both correct and simpler. Contract test drops evaluateWithArgs
from the optional-method list (no tool calls it now) and documents the gotcha.
The trow summary (tool-info.ts) and the expanded cards (browser.tsx)
each carried their own browser subtitle switch and title-key map, and
they had already drifted: only the card filtered URLs to http(s), only
the card showed wait's fixed-pause seconds. Export one
browserToolSubtitle + BROWSER_TOOL_TITLE_KEYS from tool-info.ts with
the union of the better behaviors and render both surfaces from it.
…ted module

The conversation-view registry (key scheme, ensure/adopt/window-display/
window-close decisions) lived inside controller-automation.ts with electron
imports, so none of its routing decisions were unit-testable — the
desktop-electron suite runs with a type-only electron mock. Move the key
helpers and the registry into a pure module that takes the view factory by
injection, leave controller-automation.ts as the five-line singleton, and pin
the registry's decisions (keys, ensure, draft adoption incl. fail-soft,
display sync, window close, dispose) with direct tests.

No behavior change; dispose() is wired up by the follow-up commit.
…eted

Deleting or archiving a session only severed the automation connection
(releaseSession -> detachAutomation); the WebContentsView itself was never
destroyed, so every deleted conversation leaked a live renderer process for
the rest of the app's lifetime — including sessions that only ever browsed
manually and never attached automation.

Split the bridge contract along ownership lines: releaseSession stays "the
CONNECTION goes away, the view lives on" (session idle/agent done), and a new
disposeSession means "the CONVERSATION goes away" — the desktop host destroys
the view and forgets the registry key. releaseBrowserSession now closes any
live connection and then disposes unconditionally, covering manual-browse-only
sessions too.

Tested: registry dispose semantics (incl. post-adoption keys), and the
release-vs-dispose split in the opencode session tests (sever keeps the view,
delete destroys it, manual-only sessions dispose as well).
The "draft" target and the adopt-draft channel skipped the DesktopContext
check entirely, so a stale or miswired panel in a window already showing a
conversation could still drive that window's draft view — or adopt it into an
arbitrary session it names. Drafts only exist while a window is on Home, and
adoption by design runs before the renderer navigates to the new session's
route, so both paths can simply require sessionIDForWindow(win.id) === null:
on Home the draft resolves as before; anywhere else it no-ops, same as every
other mistargeted browser:* call.
…argets

The on(target) reset cleared state and displacement but left `editing` set,
so a URL being typed in one conversation stayed open — with the old text —
after switching to another conversation's browser tab. Reset editing with the
rest; the draft signal itself needs no clearing because it is only read while
editing and beginEdit always reseeds it from the current page URL.
Every visible set-view used to reparent the view into the calling window, so
when window B took a conversation's display from window A, a per-frame
geometry tick from A still in flight would silently steal the view back —
correctness depended on A's renderer processing DISPLAY_TAKEN before its next
RAF tick.

Split intent from geometry: the renderer marks the first visible push after
the panel was hidden, displaced, or swapped targets with `claim: true`, and
main only reparents on a claiming push. Claim-less ticks from a non-host
window are dropped, so the takeover race is gone by construction; explicit
reclaim from the displaced placeholder still works because leaving the
displaced state re-arms the claim.
… spent on another site

A current-page action probes the page URL, asks the browser permission
against it, then runs — but the ask can sit open for minutes while the user
keeps browsing the embedded view, so the approval judged on site A could
execute the action on whatever site the page reached meanwhile.

After the ask passes, re-probe (same side-effect-free main-process source)
and compare origins; on a mismatch fail with a typed BrowserPageChangedError
before any CDP traffic, telling the model to retry — the retry re-asks
against the page as it is now. Same-origin moves keep the approval, and
explicit-pattern actions (navigate) are untouched: their permission names the
destination, which meanwhile-browsing cannot change.
… app always has

The screenshot card now calls useDialog for its click-to-preview, which made
the fixture throw at render — the real app wraps every surface in
DialogProvider (app.tsx), so the fixture was the odd one out.
@Astro-Han

Copy link
Copy Markdown
Owner Author

Round 12: fresh-eye multi-agent review + Codex challenge — dispositions

Two independent review passes ran against the rebased branch (multi-agent fresh-eye review with adversarial verification, and a Codex challenge pass). Merged findings, deduplicated, and dispositioned below. All fixes are pushed (6e0429b6..8430e2ec); full suites green after each (opencode 3849, desktop-electron 518, app 1838, ui 721, 0 fail; repo-wide typecheck clean; bun run snap browser-tools visually verified).

Fixed

Finding Fix
View leak on session delete/archive (both reviews, P1). Deleting a conversation only severed the CDP connection; the WebContentsView lived forever — including for sessions that only browsed manually. 6213164f splits the bridge contract along ownership lines: releaseSession = the connection goes away; new disposeSession = the conversation goes away (view destroyed, registry key forgotten). releaseBrowserSession disposes unconditionally. 1474d050 extracts the registry into a pure DI module so the decision (incl. post-adoption keys) is pinned by direct tests.
Connect not covered by abort/timeout (P1). The first action's endpoint resolution + CDP connect ran outside the timeout/stop race. 6e0429b6 races acquire under the same budget and registers the abort listener before acquire. Two tests pin it.
ask→act TOCTOU on current-page actions (P1). The permission dialog can sit open while the user keeps browsing the view; an approval judged on site A could execute on site B. d6d7dece re-probes after the ask and fails typed (BrowserPageChangedError) on an origin change, before any CDP traffic — the model retries and the permission is judged against the page as it is now. Same-origin moves keep the approval; navigate (explicit destination pattern) is unaffected.
Takeover steal-back race. A resize tick in flight when the display changed hands would silently re-steal the view; correctness depended on the loser's renderer processing DISPLAY_TAKEN before its next RAF tick. 474c871d splits intent from geometry: only a claim: true layout push (first visible push after hidden/displaced/target-swap) may reparent; claim-less ticks from a non-host window are dropped. The race is gone by construction.
Draft IPC validated too loosely. The draft target and adopt-draft skipped the DesktopContext check, so a window already showing a conversation could still drive or adopt its draft. 07e88626 gates both on the window actually being on Home (sessionIDForWindow === null); adoption by design runs before the renderer navigates, so the gate holds exactly when adoption is legitimate.
Cross-project layout write. The automation-attached layout write keyed the browser tab by the panel's current directory, not the driven session's. f114950e resolves the session's own directory and keys the write with it.
Stale URL edit crossing conversations. Switching targets kept the address bar in edit mode with the previous conversation's text. e11816ce resets editing on target swap.
Snap fixture missing DialogProvider (found by the snap run itself, not the reviews). 8430e2ec adds the provider the real app always has.

Simplified (Occam pass, confirmed dead weight)

  • 8cb0a884 retires the clients field on tool registrations — registration itself is the client gate; the non-desktop semantics stay pinned by a real-registration test.
  • f901b031 drops canonicalisation retries canonicalDeferredId already performs.
  • 086dff81 deletes a dead deferredAvailable closure in llm.ts.
  • 6ae8b280 deletes the unused displayWindowID accessor.
  • d1e8122f unifies browser tool subtitles behind one rule shared by the card and the collapsed-trow summary, so the two surfaces cannot drift.

Pushed back (verified, not action)

  • Redirect post-commit deny is soft — already documented as a soft contract and pinned in the cross-site redirect test; with the TOCTOU fix above, actions on a deny-landed page are all blocked. Request-phase CDP interception stays a follow-up.
  • extract materializes outerHTML in the page renderer — accepted P3 limit; blast radius is the embedded page's own renderer.
  • zht missing browser trow keys — the app loads en/zh only and the merge falls back to en; zht lacks the whole trow.summary section (same as ja), so this is pre-existing locale coverage, not a regression of this PR.
  • State pushes broadcast to all windows — intentional: panels showing the conversation elsewhere must stay current, and a never-displayed driven view has no host window to scope to.

…anularity

Round-13 review (both reviewers, independently): the post-ask recheck
compared origins, but the permission itself is judged against the FULL page
URL — so a same-origin move while the ask sat open (e.g. /safe -> /admin/x)
ran the action on a path that a configured path-scoped deny would have
caught. The origin compare was also a second mechanism: a typed error plus a
model retry just to arrive back at an ask against the new URL.

Replace it with the mechanism navigate already uses for redirect landings:
when the re-probe sees ANY URL change, ask again against the page as it is
now. Configured rules — origin- or path-scoped — get their say on the real
page, a deny fails the action loudly, and an unchanged URL (the common case)
skips the second ask entirely. BrowserPageChangedError and the origin helper
are deleted; the residual recheck->run window stays the documented
soft-contract bound.
Round-13 review (both reviewers): releaseBrowserSession awaited
pendingAcquires before disposing, but an acquire registers there only AFTER
resolving its root id — a delete landing in that window awaited nothing,
disposed the view, and the resuming acquire's resolveEndpoint then
resurrected it via ensure() with a live CDP connection nothing would ever
clean up again (exactly the app-lifetime leak the dispose contract exists to
kill). The unbounded await was also a hang: a stuck endpoint resolution would
block the session's deletion forever.

Invert the responsibility with a release epoch: the delete bumps a counter
and returns at once; an in-flight acquire compares the epoch after connecting
and unwinds itself — closes the socket, disposes the resurrected view, fails
the action. This covers both the pre-registration window and the registered
case, and deletes the await (with its hang) outright.

Also pins two adjacent behaviors the round documented but never tested: a
subagent-child delete no-ops against the root's live connection, and an
abort-abandoned acquire warms the cache for the next action.
…rejects draft-namespace targets

Two round-13 findings on the registry's create paths:

set-view used ensure() on visible pushes, so a stale RAF frame landing in the
gap between a session's delete (which disposed the view) and the renderer
navigating away re-created an empty WebContentsView under the deleted key —
leaked for the app lifetime, since no second delete ever comes. A panel only
shows when page state says there is a page, and only a live controller can
say that, so a visible push with no controller is stale by definition: get(),
never ensure().

adoptDraft accepted any non-empty string as the new session id, so a renderer
naming "draft:2" could re-key its own draft into another window's private
draft namespace, confusing state and display routing. The registry now
refuses targets that parse as draft keys, pinned by a direct test.
Round-13 (Codex): the renderer cleared its claim flag after SENDING the first
visible push, not after it applied. A claim that lands while the window's own
DesktopContext still lags the route swap resolves to nothing in main and is
silently dropped — after which the panel only ever sends geometry ticks,
which a non-host window's pushes rightly are dropped too, so reclaiming a
conversation displayed in another window could wedge until the user toggled
the tab.

browser:set-view now answers whether the visible push actually displayed the
view in the calling window, and the renderer keeps claiming until that
confirmation (guarded against stale acks by target/visibility recheck). The
takeover decision itself moves into a pure displayDecision helper in logic.ts
— the claim/geometry split had zero behavioral coverage because it lived
inside the electron-bound controller; the four-way decision table is now
pinned by direct tests.
…cheme guard

The subtitle consolidation made one function feed both the trow summary and
every expanded card, with safeHttpUrl as the sole guard between tool metadata
and the navigate card's clickable href — and neither had a direct test. Pin
the scheme filter (javascript:/file:/about:/data: never become a subtitle or
link) and each tool's fallback chain.
@Astro-Han

Copy link
Copy Markdown
Owner Author

Round 13: second fresh-eye pass (multi-agent + Codex) on the round-12 commits — dispositions

A second fresh-eye round reviewed the round-12 hardening commits themselves (multi-agent review, 6 lenses × 3-vote adversarial verification, plus an independent Codex challenge). Five fixes pushed (ec1463ff..6207468b); all green after each: opencode 3852, desktop-electron 523, app 1838, ui 728, 0 fail; repo-wide typecheck clean.

Fixed

Finding Fix
Re-judge compared origins, but the permission is judged against the full URL (both reviewers). A same-origin move while the ask sat open (/safe/admin/x) ran the action on a path a configured path-scoped deny would have caught. ec1463ff replaces the origin compare + typed-retry-error mechanism with the one navigate already uses for redirect landings: any URL change after the ask triggers a second ask against the page as it is now. Configured rules get their say at full granularity, a deny fails loudly, an unchanged URL skips it. Net deletion (BrowserPageChangedError and the origin helper are gone). Tests: cross-origin deny, same-origin path deny, benign same-site move.
Delete racing a first acquire resurrects the disposed view (both reviewers, P1). An acquire registers in pendingAcquires only after resolving its root id; a delete landing in that window awaited nothing, disposed the view, and the resuming acquire's resolveEndpoint re-created it with a connection nothing would ever clean up. The unbounded await pending could also hang session deletion on a stuck endpoint. 0a370632 inverts the responsibility: the delete bumps a release epoch and returns at once; an in-flight acquire compares the epoch after connecting and unwinds itself (close socket, dispose the resurrected view, fail the action). The await — and its hang — is deleted. Also pins child-delete no-op and abort-warmed-cache behaviors that were documented but untested.
A stale visible set-view after dispose re-created the view (same missing-tombstone class at the IPC layer). RAF frames still arriving between a session's delete and the renderer navigating away hit ensure() and leaked an empty view for the app lifetime. 4804ee22: set-view uses get(), never ensure() — a panel only shows when page state says there is a page, and only a live controller can say that, so a visible push with no controller is stale by definition.
adopt-draft accepted draft:N as a session id (Codex), re-keying a window's draft into another window's private draft namespace and confusing state/display routing. 4804ee22: the registry refuses targets that parse as draft keys, pinned by a direct test.
The claim flag was cleared on send, not on apply (Codex). A claim racing the window's own DesktopContext update was silently dropped in main; the panel then only ever sent geometry ticks and could wedge viewless until the user toggled the tab. 2b096dec: set-view answers whether the visible push actually displayed the view; the renderer keeps claiming until confirmed (stale acks guarded by target/visibility recheck). The takeover decision moved into a pure displayDecision helper — the claim/geometry split had zero behavioral coverage inside the electron-bound controller; the four-way decision table is now pinned by direct tests.
browserToolSubtitle/safeHttpUrl fed two surfaces with no direct test, and safeHttpUrl is the sole guard between tool metadata and the navigate card's clickable href. 6207468b pins the scheme filter and each tool's fallback chain.

Doc-accuracy nits fixed in passing: releaseBrowserSession no longer claims internal root-keying (it relies on maps only ever being populated under root ids — now stated as such), and the claim type doc covers the target-swap and reclaim cases.

Pushed back (verified, not action)

  • Residual recheck→run window and the takeover-reload redirect — the re-judge closes the minutes-long ask window; the remaining gap is a handful of async ticks plus the one-time stealth reload, the same class as the documented post-commit redirect soft contract (request-phase CDP interception stays the follow-up). Adversarial panel also voted this pre-accepted.
  • User stop does not stop the background takeover reload of a canceled first acquire — accepted, documented design: the abandoned connect settles in the background and warms the cache (now pinned by a test); the canceled action itself never drives the page.
  • Home gate trusts renderer-reported DesktopContext — by design: target validation guards stale/miswired panels, not a hostile renderer; the endpoint/secret never crosses renderer IPC either way. The reviewers confirmed no cross-window or cross-session escalation exists.
  • registry.test.ts pins hideFor calls on another window's draft that the real controller owner-gates to a no-op — the gating now has direct coverage via displayDecision; the two-line hideFor owner check stays at the controller seam, exercised by the live desktop pass.

…ception

Live testing: a browser_wait whose selector never appeared failed with the
in-page waiter's raw rejection — "Evaluate error: Error: Selector not found:
#b_results at <anonymous>:6:16" — which neither says the wait timed out, how
long it waited, nor what to do next. Map the two condition-timeout shapes
(selector/text not found) to a message that says all three: waited Ns, the
condition never appeared, take a browser_snapshot before retrying. Other
wait failures pass through unchanged.
@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round (post round-13): 1 pushed back, 2 fixed

P1 — pending acquire survives abort/timeout and still warms the cache (incl. takeover reload): pushed back.
This is the documented trade-off in withBrowserPage (session.ts) and was already recorded as a round-13 disposition. Re-examined against the suggested fix ("once canceled, close + release the acquire's connection when it resolves; never write bySession"):

  1. It cannot meet its own acceptance criterion ("no Page.reload"): the takeover reload happens inside connect(), before the acquire resolves — cleanup after resolution cannot un-fire it.
  2. Discarding the resolved connection makes the next action reconnect and reload again, turning one user-visible reload into two. The cached connection itself is passive: run() never started (the race was lost at the acquire stage), so nothing drives the page after Stop.
  3. Truly suppressing the reload requires threading cancellation through the single-flight acquire (shared by concurrent callers, so per-call signals don't compose — it needs all-waiters-abandoned refcounting) to guard a sub-second window between resolveEndpoint and the reload send.

The root-cause path (request-phase CDP interception) remains the recorded follow-up.

P3 — contract test leaks its WebSocketServer when bridge.connect throws: fixed in f7d85f8.
P3 — registry test deletes a pre-existing OPENCODE_CLIENT instead of restoring it: fixed in 5f6f9f1.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opencode/src/session/prompt.ts (1)

2234-2246: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Recompute reminder members from the current availability set.

Line 2237 reuses the members list persisted by the earlier tool_info call. That list is availability-filtered at activation time, not at reminder time. If a run stops right after tool_info and later resumes under a different client or permission context, this branch can still tell the model that browser members are “now in your tool list” even though registry.tools(...) will filter them out for the resumed step. Please intersect members with the current deferred availability here, or recompute the visible member list from the current registry state before calling buildActivationReminder.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/prompt.ts` around lines 2234 - 2246, The code
uses the persisted `members` list from `deriveNewlyActivated(...)` when
constructing activation reminders, which may be stale; instead recompute the
visible members from the current registry/availability before calling
`buildActivationReminder`. In the loop over `for (const [name, members] of
newlyActivated)` call the registry tool visibility API (or intersect `members`
with the current deferred availability set from the registry/tools lookup) to
produce a fresh `visibleMembers` array, and pass that `visibleMembers` to
`buildActivationReminder` when creating the synthetic part on `userMessage`;
keep all other logic (message part creation, ids, synthetic flag) the same.
Ensure you reference `deriveNewlyActivated`, `userMessage`, and
`buildActivationReminder` when locating the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 2234-2246: The code uses the persisted `members` list from
`deriveNewlyActivated(...)` when constructing activation reminders, which may be
stale; instead recompute the visible members from the current
registry/availability before calling `buildActivationReminder`. In the loop over
`for (const [name, members] of newlyActivated)` call the registry tool
visibility API (or intersect `members` with the current deferred availability
set from the registry/tools lookup) to produce a fresh `visibleMembers` array,
and pass that `visibleMembers` to `buildActivationReminder` when creating the
synthetic part on `userMessage`; keep all other logic (message part creation,
ids, synthetic flag) the same. Ensure you reference `deriveNewlyActivated`,
`userMessage`, and `buildActivationReminder` when locating the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bddde36a-1301-49fc-9ef6-8677a8b5056f

📥 Commits

Reviewing files that changed from the base of the PR and between 9272fba and 5f6f9f1.

📒 Files selected for processing (30)
  • packages/app/e2e/snap/fixtures/browser-tools-snap-fixture.tsx
  • packages/app/src/context/platform.tsx
  • packages/app/src/pages/session/browser/browser-panel.tsx
  • packages/app/src/pages/session/helpers.test.ts
  • packages/app/src/pages/session/helpers.ts
  • packages/app/src/pages/session/session-side-panel.tsx
  • packages/desktop-electron/src/main/browser/automation-host.ts
  • packages/desktop-electron/src/main/browser/controller-automation.ts
  • packages/desktop-electron/src/main/browser/controller.ts
  • packages/desktop-electron/src/main/browser/logic.test.ts
  • packages/desktop-electron/src/main/browser/logic.ts
  • packages/desktop-electron/src/main/browser/registry.test.ts
  • packages/desktop-electron/src/main/browser/registry.ts
  • packages/desktop-electron/src/main/env.d.ts
  • packages/desktop-electron/src/main/ipc/browser.ts
  • packages/opencode/src/browser/browser-bridge.ts
  • packages/opencode/src/browser/opencli-contract.test.ts
  • packages/opencode/src/browser/session.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/tool/browser-shared.ts
  • packages/opencode/src/tool/browser-wait.ts
  • packages/opencode/src/tool/tool-info.ts
  • packages/opencode/test/browser/session.test.ts
  • packages/opencode/test/fake/cdp-server.ts
  • packages/opencode/test/tool/browser-tools.test.ts
  • packages/opencode/test/tool/registry.test.ts
  • packages/opencode/test/tool/tool-info.test.ts
  • packages/ui/src/components/message-part/tools/browser.tsx
  • packages/ui/src/components/tool-info.ts
  • packages/ui/test/browser-tool-subtitle.test.ts
💤 Files with no reviewable changes (1)
  • packages/opencode/test/tool/tool-info.test.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/desktop-electron/src/main/browser/registry.test.ts
  • packages/desktop-electron/src/main/env.d.ts
🚧 Files skipped from review as they are similar to previous changes (13)
  • packages/desktop-electron/src/main/browser/automation-host.ts
  • packages/app/src/pages/session/helpers.ts
  • packages/opencode/src/browser/opencli-contract.test.ts
  • packages/app/src/pages/session/session-side-panel.tsx
  • packages/app/src/pages/session/helpers.test.ts
  • packages/opencode/src/tool/browser-shared.ts
  • packages/opencode/test/tool/registry.test.ts
  • packages/opencode/src/tool/browser-wait.ts
  • packages/app/e2e/snap/fixtures/browser-tools-snap-fixture.tsx
  • packages/desktop-electron/src/main/ipc/browser.ts
  • packages/opencode/src/browser/session.ts
  • packages/opencode/test/browser/session.test.ts
  • packages/app/src/pages/session/browser/browser-panel.tsx

The tab chip's x already promises "Close tab" — make it true. Closing the
browser tab destroys the conversation's page (same dispose chain as session
delete/archive) instead of merely hiding it; hiding stays on tab switches
and panel collapse, which already cover "stop watching". When an agent task
is running against a live page, confirm first. Both close paths (chip x and
mod+w) route through one flow via the shared close router.

The controller broadcasts a final empty state on destroy so panels that
outlive the view (they survive tab close and route changes) drop stale
hasPage/url instead of showing a page that no longer exists. The sidecar
rewrites CDP connection-loss into "the page was closed; the next browser
action starts over from a fresh blank page" — honest failure, no automatic
retry that would override the user's close.
@Astro-Han

Copy link
Copy Markdown
Owner Author

Design round: browser tab close is now WYSIWYG (7ccb036)

Manual testing surfaced a naming/behavior mismatch: the browser tab chip's × is labeled "Close tab ⌘W" but only hid the view — the page and its renderer process (~100–200MB each) lived on until session delete/archive. Lifecycle options were evaluated against Chrome Memory Saver, VS Code webviews (destroy-on-hide + state restore), and Codex Desktop (decompiled: park layer + snapshot placeholders + disposeAfterSessionActivity), then settled via two independent design consults.

Shipped semantics — two intents, two gestures, no automatic reclamation:

  • Switching tabs / collapsing the panel = "stop watching": hide only, background agent tasks unaffected (unchanged).
  • × / ⌘W on the browser tab = "shut it down": destroys the page via the existing delete/archive dispose chain. If an agent task is running against a live page, a confirm dialog intercepts. Any window may close a conversation's page (conversation-owned, no last-displayer logic).
  • Reopening the tab = blank page, no URL restore. Cookies survive in the shared partition.

Supporting fixes:

  • Controller broadcasts a final empty state on destroy, so panels that outlive the view (they survive tab close and route changes) drop stale hasPage/url — this also covers cross-window session deletes.
  • Sidecar rewrites CDP connection-loss into "the browser page was closed; the next browser action starts over from a fresh blank page". Honest failure, no auto-retry that would override the user's close.

Rejected (Occam): idle-timer reclamation, park/snapshot layers, URL restore, an agent-facing close tool, a separate "Close page" menu item, last-displayer reference counting.

Verification: opencode 3853 / desktop-electron 523 / app 1845 (7 new close-flow tests) all pass; repo-wide typecheck clean.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/app/src/pages/session/browser/close-page.tsx`:
- Around line 29-33: The close() function re-reads deps.target() at call time
which can change if the route updates; snapshot the target when the user gesture
starts and use that stored value when closing. Modify the flow so the initial
gesture captures const snappedTarget = deps.target() (or pass target into
close), then in close() use that snappedTarget when calling
bridge.closePage(snappedTarget) instead of calling deps.target(); keep using
deps.bridge() and deps.closeTab() as-is and ensure the stored target is set in
the scope that survives until confirm.
- Around line 40-46: The async probe using bridge.getState(deps.target()) lacks
error handling and can produce unhandled rejections; wrap the call in a safe
catch so that on any error or bridge loss you fall back to executing the close
gesture (and optionally log the error) instead of no-op. Specifically, update
the bridge.getState(...).then(...) chain (or convert to async/await) so that any
thrown error triggers a .catch(err => { /* optional log */; close(); }) and
ensure the existing logic that calls browserTabCloseAction, deps.confirm(close)
and close() remains in the success path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b9dd0115-f0ce-4938-8fef-795590443e3c

📥 Commits

Reviewing files that changed from the base of the PR and between 5f6f9f1 and 7ccb036.

📒 Files selected for processing (13)
  • packages/app/src/context/platform.tsx
  • packages/app/src/i18n/en.ts
  • packages/app/src/i18n/zh.ts
  • packages/app/src/pages/session/browser/close-page.test.ts
  • packages/app/src/pages/session/browser/close-page.tsx
  • packages/app/src/pages/session/session-side-panel.tsx
  • packages/app/src/pages/session/terminal-shell-tab.ts
  • packages/app/src/pages/session/use-session-commands.tsx
  • packages/desktop-electron/src/main/browser/controller.ts
  • packages/desktop-electron/src/main/ipc/browser.ts
  • packages/desktop-electron/src/preload/index.ts
  • packages/opencode/src/browser/session.ts
  • packages/opencode/test/browser/session.test.ts
🚧 Files skipped from review as they are similar to previous changes (6)
  • packages/desktop-electron/src/preload/index.ts
  • packages/app/src/context/platform.tsx
  • packages/desktop-electron/src/main/ipc/browser.ts
  • packages/opencode/test/browser/session.test.ts
  • packages/desktop-electron/src/main/browser/controller.ts
  • packages/opencode/src/browser/session.ts

Comment thread packages/app/src/pages/session/browser/close-page.tsx Outdated
Comment thread packages/app/src/pages/session/browser/close-page.tsx Outdated
… the probe

The confirm dialog can outlive a route change, so re-reading the target when
it resolves would destroy the page of whatever conversation the user switched
to. And the pre-close state probe only decides whether to confirm — its
failure must not veto the close (or leak an unhandled rejection).
…availability

The activation reminder reused the member list recorded at activation
time. That list is a snapshot of the ACTIVATING step's availability: a
session resumed under different permissions or a different client could
be promised a tool the registry no longer exposes. Re-filter through the
current step's availability (same formula resolveTools uses, intersected
with the registered set) and skip the reminder entirely when nothing it
would announce is exposable.
… grid

The browser_screenshot fixture entry carried no attachments, so the card
rendered collapsed (hideDetails when no image) and the screenshot
expand/preview path had no visual regression coverage. Attach a
deterministic SVG data: image (the card accepts any data:image/ URL),
pass completed-state attachments through Dynamic the way the real
renderer does, and open the card by default so the grid shows the image
area.
@Astro-Han

Copy link
Copy Markdown
Owner Author

External review round: P2 + P3 both verified and fixed

P2 — activation reminder could announce stale members (9bcaeef): fixed.
Verified: the activating step records an availability-filtered member list in tool_info metadata, but the reminder injection reused that snapshot verbatim. A session resumed under different permissions or a different client could be promised a tool the registry no longer exposes. Fix: buildActivationReminder now takes the current step's availability (same isAvailable pattern buildDeferredHint already uses), the injection site re-filters through registry.availableDeferred (registered set ∩ current permission/tools config), and the reminder is skipped entirely when nothing it would announce is exposable. Tests cover the review's acceptance case: a recorded browser_screenshot member that the current step disables no longer appears in the reminder, and a fully-hidden activation produces no reminder at all.

P3 — screenshot card image area had no snap coverage (9113e7d): fixed.
Verified: the card sets hideDetails when no image attachment is present, so the fixture (output/metadata only) never exercised the expand/preview path. Fix: the fixture's browser_screenshot entry now carries a deterministic SVG data:image/ attachment (stable rendering, recognisable as a page placeholder — a 1px PNG would stretch into a blur under object-fit: contain), completed-state attachments flow through Dynamic the same way parts/tool.tsx passes them in the real renderer, and the card opens by default. bun run snap browser-tools confirms the grid now renders the screenshot image area.

Validation: tool-info 31 tests, opencode test/session/ 836 tests, app full suite 1847 tests all green; tsgo --noEmit clean on both packages.

@Astro-Han Astro-Han merged commit e3595b7 into dev Jun 11, 2026
39 of 40 checks passed
@Astro-Han Astro-Han deleted the claude/browser-session-tools branch June 11, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Application behavior and product flows ci Continuous integration / GitHub Actions desktop enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority platform Electron shell, OS integration, packaging, updater, signing, paths, and permissions ui Design system and user interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants