Skip to content

feat: add site adapter support to browser MCP server#13482

Draft
vaayne wants to merge 11 commits intomainfrom
feat/browser-site-adapters
Draft

feat: add site adapter support to browser MCP server#13482
vaayne wants to merge 11 commits intomainfrom
feat/browser-site-adapters

Conversation

@vaayne
Copy link
Collaborator

@vaayne vaayne commented Mar 15, 2026

What this PR does

Before this PR:

The @cherry/browser MCP server only had three generic tools (open, execute, reset) — AI agents had to manually navigate to sites, figure out DOM structure or API endpoints, and write custom JS for every data extraction task.

After this PR:

Adds a site MCP tool that reuses the bb-sites adapter ecosystem — 104 pre-built JavaScript adapters across 36 platforms (Twitter, GitHub, Reddit, Bilibili, Hacker News, YouTube, etc.). AI agents can now:

  • site(action='list') — discover available adapters grouped by platform
  • site(action='search', query='twitter') — find adapters by keyword
  • site(action='info', name='twitter/search') — get adapter metadata and args
  • site(action='run', name='hackernews/top', args={count:'10'}) — execute and get structured JSON

Key features:

  • Domain-aware tab reuse — automatically finds/opens tabs matching the adapter's target domain
  • Auto-clone — clones the community adapter repo on first use if not present
  • Background auto-update — pulls latest adapters every 24h (non-blocking detached process)
  • Auth error detection — detects 401/403/login-required patterns and provides login hints
  • Persistent sessions — leverages Electron's persist:default partition for logged-in access

Why we need it and why it was done in this way

The bb-sites adapter pattern (/* @meta { ... } */ + bare JS function) is already proven with 104 adapters. Reusing it gives Cherry Studio instant access to structured data extraction for dozens of platforms without writing custom code.

The following tradeoffs were made:

  • Single site tool with action parameter instead of 104 individual MCP tools — keeps the tool list manageable
  • Adapters are not bundled — they live in ~/.bb-browser/bb-sites/ (community) and ~/.bb-browser/sites/ (local), updating independently via git
  • showWindow defaults to false for site tool (unlike open which defaults to true) since adapter execution is programmatic, not interactive

The following alternatives were considered:

  • Running bb-browser as a separate MCP server — rejected because it requires the user to install bb-browser + Chrome extension separately, and can't leverage Cherry Studio's persistent Electron sessions
  • Bundling adapter JS in the app — rejected because adapters update frequently in the community repo

Breaking changes

None. This is purely additive — new site tool alongside existing open/execute/reset tools. No existing APIs or behavior changed.

Special notes for your reviewer

  • No changes to controller.ts — all execution flows through existing open(), execute(), listTabs() methods
  • The adapter execution path: read .js file → strip @meta block → wrap as IIFE (${jsBody})(${argsJson}) → CDP Runtime.evaluate with awaitPromise: true — identical to how bb-browser runs adapters
  • All git operations (clone/pull) use async execFile/spawn, never execSync, to avoid blocking Electron's main process
  • 80 new tests across 3 test files (registry: 24, runner: 36, tool integration: 20)

Checklist

Release note

Added `site` tool to the built-in browser MCP server, enabling AI agents to run pre-built site adapters for structured data extraction from 36+ platforms (Twitter, GitHub, Reddit, Bilibili, etc.). Supports adapter discovery, search, and execution with domain-aware tab reuse, auto-update, and auth error detection.

vaayne added 11 commits March 15, 2026 18:54
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Add sites/registry.ts with:
- parseSiteMeta() for @meta JSON and @tag fallback parsing
- scanSites() for recursive .js file discovery
- getAllSites() with mtime-based caching
- findSite() for exact name lookup
- searchSites() for fuzzy matching on name/description/domain
- ensureSitesAvailable() for async git clone
- backgroundUpdate() for async git pull when >24h stale

Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Test coverage for:
- parseSiteMeta: @meta JSON, @tag fallback, malformed JSON, missing meta
- scanSites: empty dir, recursive walk, .git dir skipping
- getAllSites: local overrides community, mtime-based caching
- findSite: exact match, non-existent adapter
- searchSites: by name, description, domain, case-insensitive
- ensureSitesAvailable: already available, clone success, clone failure
- backgroundUpdate: no .git, recent update, stale update, missing timestamp

Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
- Use [\r\n] in @meta regex for Windows CRLF compatibility
- Remove premature cache invalidation in backgroundUpdate()
  to avoid race condition with in-progress git pull

Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Signed-off-by: Vaayne <liu.vaayne@gmail.com>
- Change bare `auth` to `auth.?(?:failed|expired|required|error|token)`
- Add test verifying "author not found" does not trigger login hint

Signed-off-by: Vaayne <liu.vaayne@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant