Skip to content

Comments

Fix cascading context crash under concurrent crawl load#1768

Open
Martichou wants to merge 1 commit intounclecode:developfrom
Martichou:fix/add_init_script_accumulation
Open

Fix cascading context crash under concurrent crawl load#1768
Martichou wants to merge 1 commit intounclecode:developfrom
Martichou:fix/add_init_script_accumulation

Conversation

@Martichou
Copy link

Problem

When multiple URLs are crawled concurrently and share a BrowserContext (same config signature), all crawls may fail with:

BrowserContext.add_init_script: Target page, context or browser has been closed

…due to only one URL (e.g. a slow/timing-out site) which was problematic.

Example failure logs — 10 concurrent crawls, all failing within seconds of each other:

[ERROR] Error in crawl: BrowserContext.add_init_script: Target page, context or browser has been closed
[ERROR] Error in crawl: BrowserContext.add_init_script: Target page, context or browser has been closed
[ERROR] Error in crawl: BrowserContext.add_init_script: Target page, context or browser has been closed
... (repeated for all concurrent URLs)

See detailed logs: logs.log

Root cause

_crawl_web() in async_crawler_strategy.py called context.add_init_script() on every crawl for the override_navigator, simulate_user or magic and shadow dom. These same scripts are already injected once per context in BrowserManager.setup_context().

The issue is that add_init_script is append-only, Playwright has not API to dedup or replace scripts. This causes:

  1. Script accumulation: With N concurrent crawls on one context, 2 + 2N init scripts pile up (2 from setup_context + 2 per _crawl_web call). On context reuse across batches, this grows unbounded.
  2. Memory/CPU pressure: Every new page or frame in the context must execute all accumulated scripts. With 10+ renderer processes each running dozens of duplicate scripts, Chromium hits internal resource limits.
  3. Silent context death: Chromium kills the overloaded context without a graceful error. Any concurrent page on that context, including ones that were crawling perfectly fine, immediately gets "Target page, context or browser has been closed".
  4. Cascading failure: Because add_init_script() is the first context-level operation in _crawl_web(), it's the first call to discover the context is dead, producing the distinctive error message across all concurrent tasks.

Removing the add_init_script per scrape fixes the issue has tested using the same script as in #1640 (link to direct message #1640 (comment)).

…ng failures

Init scripts (navigator_overrider, shadow-DOM) were being appended to the
shared BrowserContext on every _crawl_web() call, despite already being
injected once in BrowserManager.setup_context(). Since add_init_script()
is append-only, scripts accumulated unboundedly under concurrent load,
causing Chromium to kill the context and cascade "Target page, context or
browser has been closed" errors to all concurrent crawls.

Signed-off-by: Martichou <m@rtin.fyi>
Copilot AI review requested due to automatic review settings February 21, 2026 01:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency-related crash in Playwright crawling by preventing per-crawl mutation of shared BrowserContext init scripts, which can accumulate and cause the context to die under load.

Changes:

  • Removed per-crawl context.add_init_script(...) calls from _crawl_web().
  • Added an in-code note documenting why init scripts must not be appended from concurrent tasks and that scripts are intended to be injected via BrowserManager.setup_context().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +582 to 587
# NOTE: navigator_overrider and shadow-DOM init scripts are
# already injected once per context in BrowserManager.setup_context().
# Do NOT call context.add_init_script() here — it mutates the
# shared context from concurrent tasks and causes cascading
# "Target page, context or browser has been closed" failures.

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NOTE claims navigator_overrider/shadow-DOM init scripts are injected in BrowserManager.setup_context(), but for managed-browser default_context and persistent contexts setup_context() is called without a CrawlerRunConfig, so those scripts are not injected there. With this removal, override_navigator/simulate_user/magic (and flatten_shadow_dom) will silently stop working in those modes. Consider injecting these scripts via page.add_init_script() per crawl (page-scoped, no cross-task mutation), or ensure the default context gets a one-time, config-aware setup guarded against duplicates.

Suggested change
# NOTE: navigator_overrider and shadow-DOM init scripts are
# already injected once per context in BrowserManager.setup_context().
# Do NOT call context.add_init_script() here — it mutates the
# shared context from concurrent tasks and causes cascading
# "Target page, context or browser has been closed" failures.
# NOTE: navigator_overrider / shadow-DOM / simulate_user / magic
# scripts must be injected in a page-scoped way to avoid mutating
# shared browser contexts across concurrent tasks. We therefore use
# page.add_init_script() here instead of context.add_init_script().
scripts_to_inject = []
# Use getattr so we don't fail if some flags are missing on config
if isinstance(config, CrawlerRunConfig):
if getattr(config, "override_navigator", False):
scripts_to_inject.append(load_js_script("navigator_overrider.js"))
if getattr(config, "flatten_shadow_dom", False):
scripts_to_inject.append(load_js_script("shadow_dom_flatten.js"))
if getattr(config, "simulate_user", False):
scripts_to_inject.append(load_js_script("simulate_user.js"))
if getattr(config, "magic", False):
scripts_to_inject.append(load_js_script("magic.js"))
for script in scripts_to_inject:
# Inject per-page to avoid cross-task context mutation
await page.add_init_script(script=script)

Copilot uses AI. Check for mistakes.
Comment on lines +582 to +586
# NOTE: navigator_overrider and shadow-DOM init scripts are
# already injected once per context in BrowserManager.setup_context().
# Do NOT call context.add_init_script() here — it mutates the
# shared context from concurrent tasks and causes cascading
# "Target page, context or browser has been closed" failures.
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes flatten_shadow_dom rely entirely on BrowserManager.setup_context() being executed for a context created with flatten_shadow_dom=True. However BrowserManager._make_config_signature() currently does not include flatten_shadow_dom, so crawls with and without flatten_shadow_dom can share the same cached context and get the wrong init-script behavior (either missing flattening or permanently enabled). To keep behavior correct, include flatten_shadow_dom in the context signature (or move shadow-root forcing to page-level init scripts).

Copilot uses AI. Check for mistakes.
Comment on lines +582 to +586
# NOTE: navigator_overrider and shadow-DOM init scripts are
# already injected once per context in BrowserManager.setup_context().
# Do NOT call context.add_init_script() here — it mutates the
# shared context from concurrent tasks and causes cascading
# "Target page, context or browser has been closed" failures.
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn’t appear to be a regression test covering the specific failure mode described in the PR (many concurrent crawls sharing a context with override_navigator/magic and/or flatten_shadow_dom enabled). Adding an async test that runs concurrent arun() calls with these flags and asserts all succeed would help prevent reintroducing context-level init-script accumulation/crashes.

Copilot uses AI. Check for mistakes.
@Martichou
Copy link
Author

As per Copilot comments, not sure what you prefer here @unclecode, maybe it's ok to just switch this to page.add_init_context. It's up to you, you have better understanding of the inner working here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant