fix(browser): extract Deep Research report from out-of-process iframe#232
fix(browser): extract Deep Research report from out-of-process iframe#232umutkeltek wants to merge 6 commits into
Conversation
ChatGPT now renders Deep Research reports inside a doubly-nested, out-of-process sandboxed iframe (connector_openai_deep_research.*. oaiusercontent.com) instead of an inline assistant turn. That OOPIF does not appear in the main page's frame tree, so the in-page isolated-world extraction (readDeepResearchFrameResult) can never find it. Because that path was preferred whenever a Page was present, the capable target-attach path (readDeepResearchTargetResult) was dead code in production: waitForDeepResearchCompletion never detected completion and ran to timeout, leaving the session stuck "running" and harvesting only the "ChatGPT said:" placeholder. Prefer the target-attach path (which reaches the OOPIF and its nested frames) and fall back to the in-page frame path for legacy inline rendering. Treat a target-confirmed completion as authoritative even when the main DOM exposes no assistant turn, since the report now lives entirely in the OOPIF and the hasActiveScopedResearch heuristic no longer holds there. Verified end-to-end: reattaching to a completed Deep Research session now detects completion in ~6s and saves the full report where it previously timed out. Adds a regression test for the production shape (both Page and client passed, scoped run, report only reachable via the target path).
|
Codex review: needs maintainer review before merge. Reviewed May 30, 2026, 7:14 AM ET / 11:14 UTC. Summary Reproducibility: yes. at source level: current main chooses the in-page frame path whenever Page is present, and the production browser/reattach callers pass both Page and client. I did not run a live ChatGPT session, but the PR body includes fresh redacted live proof for the external OOPIF behavior. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the page-scoped target extraction if maintainers accept the CDP boundary proof, then verify a release smoke on Deep Research before shipping. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main chooses the in-page frame path whenever Page is present, and the production browser/reattach callers pass both Page and client. I did not run a live ChatGPT session, but the PR body includes fresh redacted live proof for the external OOPIF behavior. Is this the best way to solve the issue? Yes, this is the narrowest maintainable direction I found: prefer the target path that can see OOPIF reports, preserve the in-page fallback, and bind target discovery to the page session. The remaining question is maintainer acceptance of the CDP boundary risk, not a concrete code repair. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 6019a199e44c. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Address review: the target-attach OOPIF extraction enabled in this PR used readDeepResearchTargetResult, which enumerated browser-wide CDP targets (Target.getTargets) and returned the first completed Deep Research target it could read. In a shared/persistent Chrome profile with another completed Deep Research tab, that could save the other tab's report into the current Oracle session (cross-tab session-state / privacy leak). Scope discovery to the current Oracle-controlled page: `client` is connected to the conversation page target, so page-session auto-attach only surfaces THIS page's related targets (its Deep Research OOPIF). Drop the browser-wide Target.getTargets / attachToTarget enumeration (and the now-unneeded setDiscoverTargets); only auto-attached, page-scoped sessions are treated as belonging to this run. Add a regression test with two Deep Research targets — the current page's OOPIF (in progress) plus a foreign COMPLETED report visible browser-wide — asserting the foreign report is never read (the run times out instead). Verified the test fails against the pre-scoping source and passes after. Full browser suite green (400 passed); document the fix + scoping in the changelog.
|
Addressed the P1 (browser-wide target scan / cross-tab leak) in Fix
Regression testAdded Proof it's a real guard: against the pre-scoping source the test fails (
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Re-review found the page-scoping fix only held on direct-tab clients. On the browser-WSEndpoint / remote-Chrome path, `client` is a session-bound wrapper (createSessionBoundChromeClient) whose domain methods are session-bound but whose raw `send` is the browser-level send. readDeepResearchTargetResult issues Target.setAutoAttach via raw `send`, so on that path it auto-attached browser-wide and could still read another tab's completed Deep Research report into the current session. Tag the session-bound wrapper with its page session id (oraclePageSessionId) and pass it explicitly to Target.setAutoAttach (enable + disable). For a direct tab client this is undefined and `send` already defaults to the page session, so behavior there is unchanged; on the wrapper path the auto-attach is now bound to the page session and stays scoped to this tab. Add a regression test for the session-bound wrapper path asserting every setAutoAttach call is bound to the page session (it fails against the un-bound source, where a foreign completed report leaks in). Full browser suite green (401 passed); changelog updated.
|
Both follow-up P1s resolved + fresh live proof on the latest head. Wrapper-path binding ( Fresh live multi-tab proof (latest head) — signed-in Chrome, two completed Deep Research tabs open simultaneously (both
Redacted result + commit details in the PR body. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Re-review found two candidate-ordering gaps in the target-scan path: - readDeepResearchTargetResult returned the first in-progress / text-bearing session, so when a page exposed more than one Deep Research iframe target (e.g. a stale in-progress one attached before the completed report) it could return the incomplete one and miss the completed OOPIF. It now scans all page-scoped sessions, returns a completed read immediately, and only falls back to the best in-progress/text-bearing read when none completed. - waitForDeepResearchCompletion used `targetResult ?? frameResult`, so an incomplete (in-progress) target read suppressed the in-page frame fallback. A completed target read is still authoritative; otherwise the frame path runs and a completed in-page result is preferred. Add a regression test: two page targets (in-progress attached first, completed second) must return the completed report. Verified it fails fast against the prior early-return loop and passes after. Full browser suite green (402).
Adds the second regression case requested in review: when the target-attach read is only in-progress but the in-page frame path has a completed report, the incomplete target read must not suppress the frame fallback. Verified it fails against the pre-fix source and passes after. Pairs with the existing first-target-in-progress + later-target-completed test.
… helper Clean up the result-selection logic flagged in review. The nested ternary that chose between the target-attach read and the in-page frame read is replaced by a named, unit-tested helper (pickPreferredDeepResearchRead) and the local is renamed from the misleading `frameResult` to `read` (it can hold either source). Behaviour is unchanged: a completed read wins (target preferred), otherwise the best in-progress/text-bearing read is kept for progress logging; an incomplete target read never suppresses a completed in-page result (legacy/inline path). Adds direct unit tests for every selection branch, including the legacy-inline case (no target read + completed in-page read returns the in-page report). Full browser suite green (409 passed).
|
Addressed the re-review of the page-session-binding commit, plus a clean-up pass and fresh proof at the new HEAD. Correctness ( Clean-up ( Tests (each proven to fail against the pre-fix source): first-target-in-progress + later-target-completed; in-progress target + completed in-page frame (legacy inline); 6 unit tests for the selection helper incl. the Fresh live multi-tab proof at HEAD @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
I was kinda obsessed with the rating, it would be very logical to gamify pr's like this in the future. |
|
Downstream validation from a real Hermes/Oracle autonomous Deep Research workflow:
Minimal observed before/after: This is not just a unit-test-only issue for us; it was blocking an autonomous Research Intern pipeline because completed reports were being discarded as empty extraction artifacts. The PR head fixes the real downstream failure mode. |
Problem
Deep Research runs via the browser engine stopped returning results. The session hangs until timeout (or stays stuck
running), and harvest captures only the"ChatGPT said:"placeholder instead of the report.Root cause
ChatGPT moved the Deep Research report out of the inline assistant turn and into a doubly-nested, out-of-process sandboxed iframe:
waitForDeepResearchCompletionhas two extraction paths:readDeepResearchFrameResult— reads the main page's frame tree via isolated worlds. The OOPIF does not appear in the main page's frame tree, sofindDeepResearchFrameIdreturnsnull. It can never see the report.readDeepResearchTargetResult— attaches to the iframe's own CDP target and walks its nested frames. This one can read the report.The dispatch was
Page ? frameResult : client ? targetResult : null. In production bothPageandclientare always passed (seeindex.tsandreattach.ts), so it always took the broken frame path and the working target path was effectively dead code. The completion poll never sawfinished(no assistant turn in the main DOM either), so the run timed out.A second guard (
hasActiveScopedResearch) also depended on a main-DOM assistant turn that no longer exists, which would have blocked the report even after fixing dispatch.Fix
The scoped-staleness guard is preserved for the in-page frame path (target-confirmed completions read the live connector iframe directly, so they don't need the main-DOM corroboration).
Verification
Reattaching to a real completed Deep Research session now works end-to-end through the actual CLI:
Previously this same flow extracted only
"ChatGPT said:"and ran to timeout.Tests
Pageandclientpassed, scoped run,minTurnIndex >= 0) where the report is only reachable via the target path and the main DOM has no assistant turn → returns the report instead of hanging.does not complete from an unscoped frame result during a scoped run) still passes — the guard relaxation only applies to target-confirmed completions.deepResearch.test.ts: 28 passed. Full browser suite: 399 passed / 1 skipped.Known follow-up (not in this PR)
readDeepResearchTargetResultenumerates targets browser-wide (Target.getTargets). With multiple ChatGPT tabs open in the persistent profile, each holding a completed Deep Research report, a run could in principle pick up another tab's report. A follow-up should scope target enumeration to the current page session (theclientis already connected to the page target) and drop the browser-wide scan.Follow-up: cross-tab scoping + wrapper-path binding + fresh live proof
The first review flagged that target-first extraction scanned browser-wide CDP targets (cross-tab leak). Addressed in two commits:
53b7b331— scope discovery to the current page via page-session auto-attach; drop the browser-wideTarget.getTargets/attachToTargetscan.95903b61— the page-scoping only held for direct-tab clients. On the browser-WSEndpoint path,clientis a session-bound wrapper (createSessionBoundChromeClient) whose rawsendis browser-level, soTarget.setAutoAttachviasendstill went browser-wide there. Now the wrapper is tagged with its page session id (oraclePageSessionId) and auto-attach is bound to it explicitly. The cross-tab boundary now holds on both connection paths.Regression tests (both verified to FAIL against the pre-fix source):
Target.setAutoAttachon a session-bound wrapper client is bound to the page sessionFresh live multi-tab proof (latest head
95903b61) — a signed-in Chrome with two completed Deep Research tabs open; the realwaitForDeepResearchCompletionrun scoped to each tab returns that tab's own report, no cross-contamination:{ "tabA": { "name": "v2 (Routing Cost Map)", "len": 27099, "head": "Current LLM Inference Routing Cost Map / Scope and reading notes ...", "hasOwn_RoutingMap": true, "leaked_fromB": false }, "tabB": { "name": "v1 (Provider Cost Map)", "len": 14317, "head": "LLM Inference Provider Cost Map / Scope and normalization ...", "hasOwn_ProviderMap": true, "leaked_fromA": false } } // ISOLATION PASS — each tab returned its OWN report, no cross-contaminationBoth connector OOPIFs (
connector_openai_deep_research.*) were present simultaneously; tab A extracted the Routing report and tab B the Provider report, each from its own page-scoped session.Verification:
tests/browser/deepResearch.test.ts30 passed; full browser suite 401 passed / 1 skipped; typecheck (changed files) / format / lint clean.Update — candidate-order fix, clean-up, and re-proof at HEAD (
f5428df9)Addressing the re-review of the page-session-binding commit:
Correctness (
70375b47,8a9b98c2)waitForDeepResearchCompletionno longer lets an incomplete target read suppress the in-page frame fallback; a completed target read stays authoritative, otherwise the frame path runs (preserving legacy inline/mixed rendering — the compatibility concern).Clean-up (
f5428df9)pickPreferredDeepResearchRead; renamed the misleadingframeResultlocal toread.Tests (each verified to fail against the pre-fix source):
no-target + completed-framecasetests/browser/deepResearch.test.ts: all pass; full browser suite 410 passed / 1 skipped; typecheck (changed files) / format / lint clean.Fresh live multi-tab proof at HEAD
f5428df9— signed-in Chrome, two completed Deep Research tabs open simultaneously (both connector OOPIFs present); the realwaitForDeepResearchCompletionscoped to each tab returns that tab's own report:{ "head": "f5428df9", "tabA": { "name": "v2 (Routing Cost Map)", "len": 27099, "head": "Current LLM Inference Routing Cost Map / Scope and reading notes ...", "hasOwn_RoutingMap": true, "leaked_fromB": false }, "tabB": { "name": "v1 (Provider Cost Map)", "len": 14317, "head": "LLM Inference Provider Cost Map / Scope and normalization ...", "hasOwn_ProviderMap": true, "leaked_fromA": false } } // ISOLATION PASS — each tab returned its OWN report, no cross-contamination