feat(browser): BrowserSession, generic browser tools, and deferred activation (#1186 PR2)#1231
Conversation
|
Warning Review limit reached
More reviews will be available in 10 minutes and 26 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds target-scoped browser control, browser tools, session adoption, UI renderers, permission/activation updates, and broad test/CI coverage. ChangesEmbedded browser tooling and control flow
Sequence Diagram(s)sequenceDiagram
participant DesktopMain as Desktop main process
participant BrowserBridge as BrowserBridge host
participant BrowserControllerRegistry as ControllerRegistry
participant BrowserViewController as ViewController
participant Preload as Preload API
participant BrowserPanel as BrowserPanel
DesktopMain->>BrowserBridge: provideHost(createDesktopBrowserBridgeHost())
BrowserPanel->>Preload: navigate(target, url)
Preload->>DesktopMain: IPC browser:navigate(target,url)
DesktopMain->>BrowserControllerRegistry: resolve/ensure(target)
BrowserControllerRegistry->>BrowserViewController: display / navigate / hideFor
BrowserViewController-->>DesktopMain: emit {target, state}
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Suggested priority: P2 (includes user-path files (packages/desktop-electron/src/main/browser/automation-host.ts, packages/desktop-electron/src/main/browser/automation-resolver.test.ts, packages/desktop-electron/src/main/browser/automation-resolver.ts, packages/desktop-electron/src/main/env.d.ts, packages/desktop-electron/src/main/index.ts)).
P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.
There was a problem hiding this comment.
Code Review
This pull request introduces embedded-browser automation capabilities to the desktop application, adding seven new browser tools along with their corresponding CDP connection management, window resolution logic, UI components, and tests. Feedback on these changes highlights a potential concurrency race condition in acquire that could cause duplicate connection attempts, a resource leak in session.ts where the main process is not notified of invalidated connections, and several instances of unnecessary optional chaining on props.metadata in the UI components.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
External review round — dispositionFour findings from an external review of this PR, verified against the code before acting: P1 — tools connected (and stealth-reloaded an already-open page) before the permission ask. Confirmed; introduced by the URL-scoped permission fix (ea9ed2e), whose URL probe went through the CDP connection. Fixed in 7f6c198: the probe now reads main-process view state via a side-effect-free P1 — click / type(submit) should require confirmation by default. Not adopted — deliberate design ruling, re-confirmed by the maintainer today: every browser action defaults to allow because the embedded browser is local and fully visible (the user watches the agent act and can take over at any moment); P2 — concurrent first acquires race onto the single-client bridge. Confirmed; fixed in 8eb4857 (single-flight per root session and per endpoint, regression tests for same-session and cross-session races). P2 — invalidate() left the main process with a stale attachment. Confirmed; fixed in b738cf7 (best-effort host release at invalidation, exactly-once semantics, regression test). Full opencode suite (3748 pass) + desktop-electron suite (513 pass) + typecheck green. |
External review round 3 — dispositionBoth findings verified and fixed: P2 — host attachment orphaned around a pending acquire. Confirmed, two gaps: a connect failing after P3 — Full opencode suite (3751 pass) + typecheck green. |
External review round 4 — dispositionP1 — permission probed against one window, action possibly run in another. Confirmed: the probe and the attach each ran an independent window pick (showing-session > single window > focused), so with background sessions or multiple windows a focus change in between could grant against window A's URL and act on window B. There was also a subtler pre-existing skew: the connection is cached per session while the probe re-picked every call, so the probe could read a window the cached connection never pointed at. Fixed in 8b609b5 with the suggested lease approach: Remaining skew within a single window (the page navigating itself during a long-lived ask dialog) is accepted: the embedded browser is visible next to the dialog, and the default-allow flow has no dialog wait at all. Pending verification — Windows advisory failure. It failed in the Full opencode suite (3752 pass) + desktop-electron suite (516 pass) + both typechecks green. |
External review round 5 — dispositionBoth findings confirmed and fixed in 6c3ca9b; they were gaps in the round-4 lease, not new surfaces. P1 — a failed window pick degraded to an un-leased P2 — navigate ran un-leased. Explicit patterns (the destination) skipped the whole probe, so its attach re-picked at execution time. The lease invariant is now total and one sentence: probe (pin window, read URL) → ask → attach the leased window — or fail before the ask. Full opencode suite (3764 pass) + desktop-electron suite (516 pass) + both typechecks green. |
Review round 6 dispositionP2 — concurrent first actions with different window leases share one pending acquire: fixed in b3f5c9a. Verified as described: Fix: a connection now remembers the windowID lease it was acquired under ( On the suggested alternatives: unifying onto the first leased window is unsound — the second action's ask was already judged against its own window's URL, so moving the action transfers a grant that was never given. Serial retry inside the session layer is also out: a retry must re-probe and re-ask, and that sequence belongs to the tool layer (retrying below the ask would bypass it). Fail-fast is the only shape that preserves the lease invariant. Tests (the requested concurrent shape plus unit pins):
opencode: 3767 pass, typecheck clean. |
Round 6 follow-up: the mismatch fix itself had a convergence regression — fixed in 9929f80An internal re-review of b3f5c9a found that its cached-path check rejected the reuse but left the stale connection in Fix: a mismatching cached connection is invalidated along with the failure. Sound because a live cached connection can only mismatch when its window stopped serving this session (the attached-first probe would otherwise have returned it). The action still fails rather than reconnecting in place — invalidate's host release is fire-and-forget and an immediate re-attach would race it; the retry arrives after a full probe → ask round trip. The pending-path check is unchanged (that in-flight acquire belongs to a healthy concurrent action). Tests: the session-level mismatch test now pins the full recovery (matching lease reuses; mismatch drops the connection and notifies the host; the retry connects to the surviving window), plus a tool-level end-to-end: action on window 1 → window closes idle → next action fails retryably → retry lands on window 2. opencode: 3768 pass, typecheck clean. |
Review round 7 dispositionP2-1 — global "always allow" voids per-URL denies: fixed in 06c35fc. Verified as described, and worth being precise about what it is NOT: this does not challenge the default-allow ruling (design doc §9), which governs the unconfigured baseline. P2-1 is about a user who HAS configured tightening rules — approvals append after configured rules and evaluation is last-match-wins, so one "always allow" click on a harmless site wrote P2-2 — extract reads and converts the full page before slicing: fixed in 34a22c2. The page-side script now caps the outerHTML at 2M chars before it crosses CDP and flags the cut; the tool notes the drop and suggests narrowing with P3 — browser_wait drops the takeover-reload note: fixed in cb54d72.
opencode: 3771 pass, typecheck clean. |
Review round 8 dispositionP2 — origin-wide always still overrides same-origin path denies: fixed in 6da0164. Verified as described: the origin scope (round 7) only stops cross-site override; within the origin, an approved Fix taken at the layer where it belongs: One deliberate non-change: a same-origin configured ask (not deny) is still relaxed by the origin grant. That is the point of "always allow" — the dialog shows the origin pattern being granted — and asks are a friction preference, not a boundary. The requested verification is pinned as a full ask→reply(always)→re-ask flow test in opencode: 3772 pass, typecheck clean. |
Review round 9 dispositionP2 — a redirect can land on a site the user denied: fixed in bb12280. Verified, and the gap was deeper than described: opencli's On the stricter alternative (intercepting at the request/redirect stage): not taken for now. It needs CDP Tests: the requested ok→blocked redirect-deny shape (navigate fails, both asks logged with their patterns), plus a same-site redirect pinning that the real landing is both reported and re-judged. P3 — group activation announces members the registry won't expose: fixed in 07a38bd. Verified: activation passed on "at least one member available" and rendered the full roster; the registry filters per member. Chose per-member filtering over all-or-none — disabling one member (say screenshots) shouldn't vaporize the whole group, and the registry already behaves per-member. The activation now renders only available members and fails when none are; the rendered list is recorded in the part's metadata, and the one-shot activation reminder names that recorded set (consistency is structural: the same Tests: partial availability (screenshot disabled → rendered blocks, callable line, and metadata.members all exclude it; none available → activation fails) plus reminder/derivation pins. Pending manual verification — acknowledged; it is tracked in the PR body's Verification section (real-GUI observe → click → type → extract plus the stealth-takeover path) and stays the gate before merge. opencode: 3832 pass, typecheck clean. |
Review round 10 dispositionP2 (redirect deny is soft — page loads before the re-judge): accepted as the soft-contract option, now pinned. Round 9 already chose the soft contract (request-phase CDP P3 (partial-group residue) — split disposition:
Full opencode suite: 3835 run, 0 fail; typecheck clean. |
Review round 11 dispositionP1 (stop doesn't stop the page): real, fixed. Verified both halves: Tests, per the suggested scenarios: session-level — abort on a hung command fails fast with the typed error, host release fires, and the next action provably opens a fresh connection (second stealth registration); timeout now asserts the same severing; pre-aborted signal never touches the server. Tool-level — a hung Pending manual verification: agreed, unchanged. The live Full opencode suite: 3838 run, 0 fail; typecheck clean. |
…utomation Server side gains a BrowserBridge injection port (same-process host handed in by desktop main right after the embedded server starts, so the CDP endpoint/secret never crosses renderer IPC). Main side resolves which window an agent session drives via the renderer-reported per-window DesktopContext: session match > single window > focused window, ambiguous multi-window is a typed error. Host errors cross the boundary structurally via an error code property because main cannot share classes with the server bundle.
…eover and teardown One shared CDP connection per window endpoint, keyed by root session id so subagent calls land on the conversation the user sees. opencli's connect() already registers the stealth script for future documents; an already-open page is reloaded once so it gets the script too (the stealth source is not a public export, reload is the contract-clean path). Tool calls get a 25s timeout that beats opencli's internal 30s guard, and connection loss invalidates the cache so the next call reconnects. Session delete/archive releases the connection and detaches the main-process bridge via the clearPendingInteractions hook. Pins @jackwener/opencli 1.8.3; a contract test locks the IPage surface the tools call so a version bump fails in CI instead of in a user's session.
navigate/snapshot/click/type/wait/screenshot/extract over the BrowserSession IPage, following the observe→act contract: snapshot returns [N] refs, click and type take a ref and report self-verification (matches_n, match_level, verified/actual). navigate validates http/https BEFORE goto since CDPPage.goto bypasses the view's will-navigate guard; extract passes the selector as structured JSON (never string-concatenated into page JS) and pages long markdown via a next_start_char cursor; screenshot returns a PNG attachment with annotated-capture fallback. The browser permission key is declared in the config schema (Rule, URL-scoped patterns for navigate); the effective default is allow via the agent baseline "*": "allow".
DEFERRED entries gain group and clients fields; the seven browser_* tools form the 'browser' group, shown as ONE card in tool_info's listing and activated as one set: tool_info(name="browser") — or any member name — returns every member's description and schema and records the group as activated. Derivation expands the group to member ids from durable history (same storage-fed path, so it survives compaction and restart); the one-shot activation reminder lists the members and warns there is no tool named 'browser'; the repair hint for a direct browser_* call routes to the group. Builtins register desktop-gated; cards filter by client so non- desktop hosts never advertise tools they cannot run; Permission.disabled maps browser_* to the browser key so permission.browser deny hides cards and hints, not just the eventual ask.
Render the seven browser_* tools in the session timeline: navigate gets a clickable link row like webfetch; snapshot and extract expand to the text the agent actually read; click/type/wait/screenshot show their literal target as subtitle. Tool icon, localized titles (en/zh/zht), and a "browser" trow activity kind so collapsed turns summarize browser steps as one line. Covered by the browser-tools snap target rendering the cards inside a real TrowBlock.
opencli's evaluateWithArgs injects each argument as a top-level const, so the injected variable is `selector`, not `args.selector` — the extract script referenced `args.selector` and threw ReferenceError on the real CDP backend, making page extraction unusable (the fake server returned a canned value, so tests missed it). Drop the evaluateWithArgs branch entirely: the selector is already JSON-serialized into the evaluate() script (injection-safe), so the single path is both correct and simpler. Contract test drops evaluateWithArgs from the optional-method list (no tool calls it now) and documents the gotcha.
The trow summary (tool-info.ts) and the expanded cards (browser.tsx) each carried their own browser subtitle switch and title-key map, and they had already drifted: only the card filtered URLs to http(s), only the card showed wait's fixed-pause seconds. Export one browserToolSubtitle + BROWSER_TOOL_TITLE_KEYS from tool-info.ts with the union of the better behaviors and render both surfaces from it.
…ted module The conversation-view registry (key scheme, ensure/adopt/window-display/ window-close decisions) lived inside controller-automation.ts with electron imports, so none of its routing decisions were unit-testable — the desktop-electron suite runs with a type-only electron mock. Move the key helpers and the registry into a pure module that takes the view factory by injection, leave controller-automation.ts as the five-line singleton, and pin the registry's decisions (keys, ensure, draft adoption incl. fail-soft, display sync, window close, dispose) with direct tests. No behavior change; dispose() is wired up by the follow-up commit.
…eted Deleting or archiving a session only severed the automation connection (releaseSession -> detachAutomation); the WebContentsView itself was never destroyed, so every deleted conversation leaked a live renderer process for the rest of the app's lifetime — including sessions that only ever browsed manually and never attached automation. Split the bridge contract along ownership lines: releaseSession stays "the CONNECTION goes away, the view lives on" (session idle/agent done), and a new disposeSession means "the CONVERSATION goes away" — the desktop host destroys the view and forgets the registry key. releaseBrowserSession now closes any live connection and then disposes unconditionally, covering manual-browse-only sessions too. Tested: registry dispose semantics (incl. post-adoption keys), and the release-vs-dispose split in the opencode session tests (sever keeps the view, delete destroys it, manual-only sessions dispose as well).
The "draft" target and the adopt-draft channel skipped the DesktopContext check entirely, so a stale or miswired panel in a window already showing a conversation could still drive that window's draft view — or adopt it into an arbitrary session it names. Drafts only exist while a window is on Home, and adoption by design runs before the renderer navigates to the new session's route, so both paths can simply require sessionIDForWindow(win.id) === null: on Home the draft resolves as before; anywhere else it no-ops, same as every other mistargeted browser:* call.
…argets The on(target) reset cleared state and displacement but left `editing` set, so a URL being typed in one conversation stayed open — with the old text — after switching to another conversation's browser tab. Reset editing with the rest; the draft signal itself needs no clearing because it is only read while editing and beginEdit always reseeds it from the current page URL.
Every visible set-view used to reparent the view into the calling window, so when window B took a conversation's display from window A, a per-frame geometry tick from A still in flight would silently steal the view back — correctness depended on A's renderer processing DISPLAY_TAKEN before its next RAF tick. Split intent from geometry: the renderer marks the first visible push after the panel was hidden, displaced, or swapped targets with `claim: true`, and main only reparents on a claiming push. Claim-less ticks from a non-host window are dropped, so the takeover race is gone by construction; explicit reclaim from the displaced placeholder still works because leaving the displaced state re-arms the claim.
… spent on another site A current-page action probes the page URL, asks the browser permission against it, then runs — but the ask can sit open for minutes while the user keeps browsing the embedded view, so the approval judged on site A could execute the action on whatever site the page reached meanwhile. After the ask passes, re-probe (same side-effect-free main-process source) and compare origins; on a mismatch fail with a typed BrowserPageChangedError before any CDP traffic, telling the model to retry — the retry re-asks against the page as it is now. Same-origin moves keep the approval, and explicit-pattern actions (navigate) are untouched: their permission names the destination, which meanwhile-browsing cannot change.
… app always has The screenshot card now calls useDialog for its click-to-preview, which made the fixture throw at render — the real app wraps every surface in DialogProvider (app.tsx), so the fixture was the odd one out.
Round 12: fresh-eye multi-agent review + Codex challenge — dispositionsTwo independent review passes ran against the rebased branch (multi-agent fresh-eye review with adversarial verification, and a Codex challenge pass). Merged findings, deduplicated, and dispositioned below. All fixes are pushed ( Fixed
Simplified (Occam pass, confirmed dead weight)
Pushed back (verified, not action)
|
…anularity Round-13 review (both reviewers, independently): the post-ask recheck compared origins, but the permission itself is judged against the FULL page URL — so a same-origin move while the ask sat open (e.g. /safe -> /admin/x) ran the action on a path that a configured path-scoped deny would have caught. The origin compare was also a second mechanism: a typed error plus a model retry just to arrive back at an ask against the new URL. Replace it with the mechanism navigate already uses for redirect landings: when the re-probe sees ANY URL change, ask again against the page as it is now. Configured rules — origin- or path-scoped — get their say on the real page, a deny fails the action loudly, and an unchanged URL (the common case) skips the second ask entirely. BrowserPageChangedError and the origin helper are deleted; the residual recheck->run window stays the documented soft-contract bound.
Round-13 review (both reviewers): releaseBrowserSession awaited pendingAcquires before disposing, but an acquire registers there only AFTER resolving its root id — a delete landing in that window awaited nothing, disposed the view, and the resuming acquire's resolveEndpoint then resurrected it via ensure() with a live CDP connection nothing would ever clean up again (exactly the app-lifetime leak the dispose contract exists to kill). The unbounded await was also a hang: a stuck endpoint resolution would block the session's deletion forever. Invert the responsibility with a release epoch: the delete bumps a counter and returns at once; an in-flight acquire compares the epoch after connecting and unwinds itself — closes the socket, disposes the resurrected view, fails the action. This covers both the pre-registration window and the registered case, and deletes the await (with its hang) outright. Also pins two adjacent behaviors the round documented but never tested: a subagent-child delete no-ops against the root's live connection, and an abort-abandoned acquire warms the cache for the next action.
…rejects draft-namespace targets Two round-13 findings on the registry's create paths: set-view used ensure() on visible pushes, so a stale RAF frame landing in the gap between a session's delete (which disposed the view) and the renderer navigating away re-created an empty WebContentsView under the deleted key — leaked for the app lifetime, since no second delete ever comes. A panel only shows when page state says there is a page, and only a live controller can say that, so a visible push with no controller is stale by definition: get(), never ensure(). adoptDraft accepted any non-empty string as the new session id, so a renderer naming "draft:2" could re-key its own draft into another window's private draft namespace, confusing state and display routing. The registry now refuses targets that parse as draft keys, pinned by a direct test.
Round-13 (Codex): the renderer cleared its claim flag after SENDING the first visible push, not after it applied. A claim that lands while the window's own DesktopContext still lags the route swap resolves to nothing in main and is silently dropped — after which the panel only ever sends geometry ticks, which a non-host window's pushes rightly are dropped too, so reclaiming a conversation displayed in another window could wedge until the user toggled the tab. browser:set-view now answers whether the visible push actually displayed the view in the calling window, and the renderer keeps claiming until that confirmation (guarded against stale acks by target/visibility recheck). The takeover decision itself moves into a pure displayDecision helper in logic.ts — the claim/geometry split had zero behavioral coverage because it lived inside the electron-bound controller; the four-way decision table is now pinned by direct tests.
…cheme guard The subtitle consolidation made one function feed both the trow summary and every expanded card, with safeHttpUrl as the sole guard between tool metadata and the navigate card's clickable href — and neither had a direct test. Pin the scheme filter (javascript:/file:/about:/data: never become a subtitle or link) and each tool's fallback chain.
Round 13: second fresh-eye pass (multi-agent + Codex) on the round-12 commits — dispositionsA second fresh-eye round reviewed the round-12 hardening commits themselves (multi-agent review, 6 lenses × 3-vote adversarial verification, plus an independent Codex challenge). Five fixes pushed ( Fixed
Doc-accuracy nits fixed in passing: Pushed back (verified, not action)
|
…ception Live testing: a browser_wait whose selector never appeared failed with the in-page waiter's raw rejection — "Evaluate error: Error: Selector not found: #b_results at <anonymous>:6:16" — which neither says the wait timed out, how long it waited, nor what to do next. Map the two condition-timeout shapes (selector/text not found) to a message that says all three: waited Ns, the condition never appeared, take a browser_snapshot before retrying. Other wait failures pass through unchanged.
External review round (post round-13): 1 pushed back, 2 fixedP1 — pending acquire survives abort/timeout and still warms the cache (incl. takeover reload): pushed back.
The root-cause path (request-phase CDP interception) remains the recorded follow-up. P3 — contract test leaks its WebSocketServer when |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opencode/src/session/prompt.ts (1)
2234-2246:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRecompute reminder members from the current availability set.
Line 2237 reuses the
memberslist persisted by the earliertool_infocall. That list is availability-filtered at activation time, not at reminder time. If a run stops right aftertool_infoand later resumes under a different client or permission context, this branch can still tell the model that browser members are “now in your tool list” even thoughregistry.tools(...)will filter them out for the resumed step. Please intersectmemberswith the current deferred availability here, or recompute the visible member list from the current registry state before callingbuildActivationReminder.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/session/prompt.ts` around lines 2234 - 2246, The code uses the persisted `members` list from `deriveNewlyActivated(...)` when constructing activation reminders, which may be stale; instead recompute the visible members from the current registry/availability before calling `buildActivationReminder`. In the loop over `for (const [name, members] of newlyActivated)` call the registry tool visibility API (or intersect `members` with the current deferred availability set from the registry/tools lookup) to produce a fresh `visibleMembers` array, and pass that `visibleMembers` to `buildActivationReminder` when creating the synthetic part on `userMessage`; keep all other logic (message part creation, ids, synthetic flag) the same. Ensure you reference `deriveNewlyActivated`, `userMessage`, and `buildActivationReminder` when locating the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 2234-2246: The code uses the persisted `members` list from
`deriveNewlyActivated(...)` when constructing activation reminders, which may be
stale; instead recompute the visible members from the current
registry/availability before calling `buildActivationReminder`. In the loop over
`for (const [name, members] of newlyActivated)` call the registry tool
visibility API (or intersect `members` with the current deferred availability
set from the registry/tools lookup) to produce a fresh `visibleMembers` array,
and pass that `visibleMembers` to `buildActivationReminder` when creating the
synthetic part on `userMessage`; keep all other logic (message part creation,
ids, synthetic flag) the same. Ensure you reference `deriveNewlyActivated`,
`userMessage`, and `buildActivationReminder` when locating the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: bddde36a-1301-49fc-9ef6-8677a8b5056f
📒 Files selected for processing (30)
packages/app/e2e/snap/fixtures/browser-tools-snap-fixture.tsxpackages/app/src/context/platform.tsxpackages/app/src/pages/session/browser/browser-panel.tsxpackages/app/src/pages/session/helpers.test.tspackages/app/src/pages/session/helpers.tspackages/app/src/pages/session/session-side-panel.tsxpackages/desktop-electron/src/main/browser/automation-host.tspackages/desktop-electron/src/main/browser/controller-automation.tspackages/desktop-electron/src/main/browser/controller.tspackages/desktop-electron/src/main/browser/logic.test.tspackages/desktop-electron/src/main/browser/logic.tspackages/desktop-electron/src/main/browser/registry.test.tspackages/desktop-electron/src/main/browser/registry.tspackages/desktop-electron/src/main/env.d.tspackages/desktop-electron/src/main/ipc/browser.tspackages/opencode/src/browser/browser-bridge.tspackages/opencode/src/browser/opencli-contract.test.tspackages/opencode/src/browser/session.tspackages/opencode/src/session/prompt.tspackages/opencode/src/tool/browser-shared.tspackages/opencode/src/tool/browser-wait.tspackages/opencode/src/tool/tool-info.tspackages/opencode/test/browser/session.test.tspackages/opencode/test/fake/cdp-server.tspackages/opencode/test/tool/browser-tools.test.tspackages/opencode/test/tool/registry.test.tspackages/opencode/test/tool/tool-info.test.tspackages/ui/src/components/message-part/tools/browser.tsxpackages/ui/src/components/tool-info.tspackages/ui/test/browser-tool-subtitle.test.ts
💤 Files with no reviewable changes (1)
- packages/opencode/test/tool/tool-info.test.ts
✅ Files skipped from review due to trivial changes (2)
- packages/desktop-electron/src/main/browser/registry.test.ts
- packages/desktop-electron/src/main/env.d.ts
🚧 Files skipped from review as they are similar to previous changes (13)
- packages/desktop-electron/src/main/browser/automation-host.ts
- packages/app/src/pages/session/helpers.ts
- packages/opencode/src/browser/opencli-contract.test.ts
- packages/app/src/pages/session/session-side-panel.tsx
- packages/app/src/pages/session/helpers.test.ts
- packages/opencode/src/tool/browser-shared.ts
- packages/opencode/test/tool/registry.test.ts
- packages/opencode/src/tool/browser-wait.ts
- packages/app/e2e/snap/fixtures/browser-tools-snap-fixture.tsx
- packages/desktop-electron/src/main/ipc/browser.ts
- packages/opencode/src/browser/session.ts
- packages/opencode/test/browser/session.test.ts
- packages/app/src/pages/session/browser/browser-panel.tsx
The tab chip's x already promises "Close tab" — make it true. Closing the browser tab destroys the conversation's page (same dispose chain as session delete/archive) instead of merely hiding it; hiding stays on tab switches and panel collapse, which already cover "stop watching". When an agent task is running against a live page, confirm first. Both close paths (chip x and mod+w) route through one flow via the shared close router. The controller broadcasts a final empty state on destroy so panels that outlive the view (they survive tab close and route changes) drop stale hasPage/url instead of showing a page that no longer exists. The sidecar rewrites CDP connection-loss into "the page was closed; the next browser action starts over from a fresh blank page" — honest failure, no automatic retry that would override the user's close.
Design round: browser tab close is now WYSIWYG (7ccb036)Manual testing surfaced a naming/behavior mismatch: the browser tab chip's × is labeled "Close tab ⌘W" but only hid the view — the page and its renderer process (~100–200MB each) lived on until session delete/archive. Lifecycle options were evaluated against Chrome Memory Saver, VS Code webviews (destroy-on-hide + state restore), and Codex Desktop (decompiled: park layer + snapshot placeholders + disposeAfterSessionActivity), then settled via two independent design consults. Shipped semantics — two intents, two gestures, no automatic reclamation:
Supporting fixes:
Rejected (Occam): idle-timer reclamation, park/snapshot layers, URL restore, an agent-facing close tool, a separate "Close page" menu item, last-displayer reference counting. Verification: opencode 3853 / desktop-electron 523 / app 1845 (7 new close-flow tests) all pass; repo-wide typecheck clean. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/app/src/pages/session/browser/close-page.tsx`:
- Around line 29-33: The close() function re-reads deps.target() at call time
which can change if the route updates; snapshot the target when the user gesture
starts and use that stored value when closing. Modify the flow so the initial
gesture captures const snappedTarget = deps.target() (or pass target into
close), then in close() use that snappedTarget when calling
bridge.closePage(snappedTarget) instead of calling deps.target(); keep using
deps.bridge() and deps.closeTab() as-is and ensure the stored target is set in
the scope that survives until confirm.
- Around line 40-46: The async probe using bridge.getState(deps.target()) lacks
error handling and can produce unhandled rejections; wrap the call in a safe
catch so that on any error or bridge loss you fall back to executing the close
gesture (and optionally log the error) instead of no-op. Specifically, update
the bridge.getState(...).then(...) chain (or convert to async/await) so that any
thrown error triggers a .catch(err => { /* optional log */; close(); }) and
ensure the existing logic that calls browserTabCloseAction, deps.confirm(close)
and close() remains in the success path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b9dd0115-f0ce-4938-8fef-795590443e3c
📒 Files selected for processing (13)
packages/app/src/context/platform.tsxpackages/app/src/i18n/en.tspackages/app/src/i18n/zh.tspackages/app/src/pages/session/browser/close-page.test.tspackages/app/src/pages/session/browser/close-page.tsxpackages/app/src/pages/session/session-side-panel.tsxpackages/app/src/pages/session/terminal-shell-tab.tspackages/app/src/pages/session/use-session-commands.tsxpackages/desktop-electron/src/main/browser/controller.tspackages/desktop-electron/src/main/ipc/browser.tspackages/desktop-electron/src/preload/index.tspackages/opencode/src/browser/session.tspackages/opencode/test/browser/session.test.ts
🚧 Files skipped from review as they are similar to previous changes (6)
- packages/desktop-electron/src/preload/index.ts
- packages/app/src/context/platform.tsx
- packages/desktop-electron/src/main/ipc/browser.ts
- packages/opencode/test/browser/session.test.ts
- packages/desktop-electron/src/main/browser/controller.ts
- packages/opencode/src/browser/session.ts
… the probe The confirm dialog can outlive a route change, so re-reading the target when it resolves would destroy the page of whatever conversation the user switched to. And the pre-close state probe only decides whether to confirm — its failure must not veto the close (or leak an unhandled rejection).
…availability The activation reminder reused the member list recorded at activation time. That list is a snapshot of the ACTIVATING step's availability: a session resumed under different permissions or a different client could be promised a tool the registry no longer exposes. Re-filter through the current step's availability (same formula resolveTools uses, intersected with the registered set) and skip the reminder entirely when nothing it would announce is exposable.
… grid The browser_screenshot fixture entry carried no attachments, so the card rendered collapsed (hideDetails when no image) and the screenshot expand/preview path had no visual regression coverage. Attach a deterministic SVG data: image (the card accepts any data:image/ URL), pass completed-state attachments through Dynamic the way the real renderer does, and open the card by default so the grid shows the image area.
External review round: P2 + P3 both verified and fixedP2 — activation reminder could announce stale members (9bcaeef): fixed. P3 — screenshot card image area had no snap coverage (9113e7d): fixed. Validation: tool-info 31 tests, opencode |
Summary
Second of three flat PRs for embedded-browser agent control (#1186), built on the sealed CDP bridge from PR #1221. This is the agent-facing layer: per-conversation embedded browser views, a
BrowserSessionthat drives a conversation'sWebContentsViewover the bridge, seven genericbrowser_*tools, lazy model-driven activation of a deferredbrowsertool group, and the session-timeline cards that render each step.main/browser/controller.ts+controller-automation.ts, desktop) — oneWebContentsViewper conversation (root session), lazily created, plus a per-window draft on the new-session screen that is adopted into the session it creates. Windows are pure displays:display(win, rect)reparents a conversation's view into whichever window shows it; a view driven in the background keeps running unattached (default 1280×720 bounds so screenshots never capture 0×0). Page, history, and login state (one shared persistent partition) live and die with the conversation — another conversation's page showing up is unrepresentable.browser/browser-bridge.ts(opencode) —BrowserBridgehost seam: the embedded server asks the main process to resolve a session to a CDP endpoint and to release it on teardown. Resolution is the identity mapping (session → its own view); structuredBrowserBridgeErrorcodes so the server can reason about failures without sharing the main-process class.browser/session.ts(opencode) —withBrowserPageconnects via opencliCDPBridge(stealth injected on connect), caches one connection per conversation with per-root single-flight, reloads once to apply stealth when taking over an already-open http(s) document, enforces a 25s tool-level timeout, invalidates and reconnects on connection loss, and releases on session delete/archive.tool/browser-*.ts—browser_navigate/snapshot/click/type/wait/screenshot/extract, each behind thebrowserpermission key (baselineallow). SharedrunBrowserActionhelper; screenshots returndata:image/pngattachments rendered inline on the tool card; extract paginates with a char cursor.tool/tool-info.ts+registry.ts— the seven tools form a deferredbrowsergroup,desktop-gated, activated by group or member name, collapsed to one activation card. Activation derives durably from message parts so it survives compaction and restart.ui/.../tools/browser.tsx+ i18n + trow grouping — seven cards (navigate gets a clickable link like webfetch; snapshot/extract expand to the text the agent read; screenshot expands to the captured image with click-to-zoom), localized titles (en/zh/zht), and abrowsertrow activity kind so collapsed turns summarize browser steps as one line.pawwork.txt+shell.txt) — web-task intents route to the browser group from the resident surfaces;webfetchpositioned as one-shot read-only;curl/wgetbrowser sessions forbidden; contract test pinning all three surfaces.Why
PawWork's agent needs to drive the embedded browser with no Chrome, no extension, and no second process. PR #1221 landed the sealed transport; this PR turns it into tools the model can actually call. The tools stay generic (navigate/observe/act/extract) and lazy — they activate only when the model asks for the
browsergroup, so the default tool surface is unchanged for non-browser work. Stealth is treated as a first-class capability (injected on every connect), not an opt-out.Related Issue
#1186
Post-review redesign: conversations own their browser
Live-desktop testing after round 11 surfaced three symptoms with one root cause: the browser was owned by the window (PR #1221 model) while every consumer — the agent, the permission model, the timeline, the user — thinks in conversations. The panel leaked across sessions during agent drives, a page opened on the new-session screen could not be handed to the session it created, and two conversations in one window shared one page. The window-lease machinery (probe pins a windowID, attach validates it, mismatch self-heals, three-tier window arbitration) existed only to bridge that mismatch.
The redesign (its own design doc, reviewed by a second model before build) re-keys views by conversation and retires the lease wholesale:
windowID→rootSessionID | draft:<windowID>. Rendererbrowser:*IPC carries an explicit target, validated in main against the calling window'sDesktopContext— a stale or miswired panel no-ops instead of steering another conversation's view.resolveEndpointloseswindowID;probeWindowbecomesprobeSession({ url });no-window/window-ambiguouserror codes, the lease-mismatch fail-fast,byEndpointconnection sharing, and the resolver arbitration are deleted (~570 lines net). The TOCTOU class the lease guarded — permission judged against one window, action landing in another — is now unrepresentable: a session's action can only ever land in that session's own view.Intentional Deviation From Earlier Review (Codex P1-2)
The plan's first review suggested threading a
visible_session_idheader throughpromptAsyncto bind a tool call to the window the user sees. The per-conversation ownership model supersedes both that suggestion and the main-process window resolution that replaced it: there is no window to resolve anymore — the session IS the target. The ALS/request-context header route stays dropped.Review Hardening
Eleven external review rounds tightened the permission model after the initial submission; every disposition is a PR comment. The window-lease invariants rounds 6–9 converged on are retired by the ownership redesign above (the ambiguity they guarded no longer exists). What survives them: every action reads its own conversation's page URL at probe time (side-effect free, before the ask) and the permission is judged against it; "always allow" grants are origin-scoped; a configured deny is never overridden by an approval (shared permission layer, covers every permission); a user stop (abort) or tool timeout severs the CDP connection, so a canceled action can never keep driving the page; and a redirect's real landing is re-judged after load — a deny on the destination fails the action loudly. That last one is a deliberately soft contract (the document has committed by re-judge time; vetoing the load itself needs request-phase CDP interception and is a documented follow-up), pinned as such in the cross-site redirect test. The browser-defaults-to-allow ruling (design doc §9, reconfirmed mid-review) is pinned by
pawwork-defaults.test.ts.Verification
Automated, all green (as of the round-13 hardening push):
bun run snap browser-tools— the browser tool cards grid (cards, navigate link, snapshot/extract expansion, running shimmer, collapsed trow) visually verified after the subtitle unification.Live desktop regression (interactive GUI, scheduled next): six-prompt script — observe → click → type → extract on a real page, stealth takeover of an already-open document, screenshot inline rendering, conversation switching mid-drive, Home-draft adoption into a new session, and mid-action stop.
Review Focus
controller.ts/controller-automation.ts: display/hide reparenting, draft adoption atomicity, the route-change sweep insyncWindowDisplay, and target validation inipc/browser.ts.session.tsconnection lifecycle: per-conversation 1:1 mapping, the one-shot takeover reload, abort/timeout severing, connection-loss invalidation.tool-info.ts(group/member naming, durable derivation across compaction, repair hint, reminder).Human Review Status
Pending
Summary by CodeRabbit