Skip to content

MakFly/ghostchrome

Repository files navigation

ghostchrome

Ultra-light browser automation CLI for LLM agents. Single Go binary, native Chrome DevTools Protocol, 3-4× fewer tokens than Playwright-MCP, no Node runtime. A modern Playwright alternative built for AI agents that drive a browser in a loop.

Go Release License: MIT tokens vs playwright-mcp latency vs playwright-mcp binary size

$ ghostchrome preview http://localhost:3000
[200] Dashboard — http://localhost:3000 (134ms)
[errors] none
[network] 12 reqs, 0 failed
[dom]
  h1 Dashboard
  @1 b Add user
  table 5 rows
  @2 a>/settings Settings

One command. ~50 ms warm. ~2,000 tokens. Refs (@1, @2) you can click and type into next.


Table of contents


Why ghostchrome

LLM-driven browser automation has a token problem. Playwright-MCP returns a full accessibility tree on every snapshot — typically 14,000-50,000 tokens for a real-world page — which burns the agent's context window and slows every iteration. ghostchrome was built to fix that one thing: return the smallest possible payload that an LLM still needs to act, in a single static Go binary that boots in milliseconds.

Designed for AI agents that drive a browser via Claude Code, the Anthropic Agent SDK, Aider, Cursor, OpenAI's Agents SDK, or any custom loop. Use it as a Playwright alternative for headless Chrome web scraping, as a CDP CLI for ops automation, or as the browsing tool behind a custom agent. No JSON-RPC overhead, no Node runtime, no npm install. Just ghostchrome <command> <url> and read the output.

What you get:

  • Filtered accessibility tree — only interactive elements get refs (@1, @2), 3-5× fewer nodes than a full a11y dump.
  • Three extraction levelsskeleton (minimal), content (text), full (everything named).
  • Auto-launch or attach — every command can spawn a temporary Chrome or attach to an existing session via --connect=auto.
  • CDP-native — built on Rod, so iframe handling, stealth patches, and event capture work out of the box.
  • Single 24 MB binary — no Node.js, no npm install, no Playwright browsers download.

Benchmark

Reproducible head-to-head against @playwright/mcp on 5 local HTML fixtures + real public sites. Run it yourself:

./benchmark/run-bench.sh                 # cold-spawn mode (default)
BENCH_MODE=warm ./benchmark/run-bench.sh # long-lived session (real agent loop)

Warm session — the real LLM-agent loop

Both tools keep one process alive across navigate+snapshot calls. This is what your agent actually does.

Site ghostchrome tokens pw-mcp tokens ghostchrome ms pw-mcp ms
dashboard (CRUD table) 549 2,746 50 64
product page 390 1,456 45 55
news feed 851 2,242 40 51
search results 1,224 2,421 60 73
Hacker News (live) 3,416 14,564 660 1,023
Overall 6,832 24,961 1,020 ms 1,660 ms

3.65× fewer tokens, 1.63× faster per snapshot. Full table: benchmark/results-warm.md.

Cold spawn — every invocation starts fresh

Apples-to-apples wall time of process start → Chrome attach → navigate → snapshot → exit for both tools. Chrome startup dominates and ghostchrome is ~10% slower here — which is why you should use warm session (above) for any agent workload.

3.5× fewer tokens, 0.91× as fast overall (cold). Full table: benchmark/results.md.

Binary & footprint

ghostchrome Playwright-MCP
Runtime Static Go binary Node.js
Install size ~24 MB ~80 MB Node + ~250 MB Playwright + browsers
Cold boot <1s 2-5s (npx + Playwright init)
Dependencies Chrome on the system or auto-downloaded by Rod npm install + npx playwright install
Protocol CLI stdin/stdout, optional MCP server MCP (JSON-RPC over stdio)

Token estimates assume ceil(bytes/4), the standard rule-of-thumb for BPE tokenizers. Numbers above are medians of 2-3 trials on Linux x86_64, Chromium 131, May 2026.


Install

Quick install (macOS & Linux)

curl -fsSL https://raw.githubusercontent.com/MakFly/ghostchrome/main/install.sh | sh

Go

go install github.com/MakFly/ghostchrome@latest

Manual

Prebuilt binaries for macOS (Intel/ARM), Linux (amd64/arm64), and Windows on the Releases page.

Requirements

  • Chrome or Chromium installed. If none is found, Rod auto-downloads a compatible Chromium to ~/.cache/rod/ on first run.

Quickstart

See a page

ghostchrome preview https://example.com

Single command returns status code, page title, console + network errors, request count, and a compact DOM with refs. The first call an agent makes to a new URL.

Extract a clickable DOM

ghostchrome extract https://news.ycombinator.com --level content

Compact accessibility tree with refs (@1, @2, …). Three levels: skeleton (interactive only), content (adds text), full (everything named).

Drive the page

# Each command can navigate first, then act, then return the new snapshot.
ghostchrome click @3 https://example.com/login
ghostchrome type  @1 "alice@example.com" https://example.com/login
ghostchrome press Enter https://example.com/login

Refs come from the previous snapshot. The browser session is preserved when you use --connect=auto (recommended).

Long-lived session

# Terminal 1
ghostchrome serve --port 9222
# → ws://127.0.0.1:9222/devtools/browser/<uuid>

# Terminal 2 (or your agent)
ghostchrome preview https://example.com --connect=auto
ghostchrome click   @1                  --connect=auto
ghostchrome extract                     --connect=auto --level content

--connect=auto discovers a serve instance on 127.0.0.1:9222-9229 automatically. Per-call latency drops to ~50 ms.

Debug a page

ghostchrome errors https://your-site.test --level all

Captures Runtime.consoleAPICalled + Runtime.exceptionThrown + Log.entryAdded (CORS, CSP, mixed content, network ERR_*) + every HTTP 4xx/5xx — all in one snapshot.


How it works

your agent → ghostchrome CLI → Rod (Go) → Chrome DevTools Protocol → Chrome
  1. CDP Accessibility tree is fetched and filtered: only nodes that are interactive (or named ancestors) are kept. Everything is compressed into one indented text format with @N refs.
  2. Three extraction levels let an agent ask for exactly the granularity it needs. Most agent loops stay at content.
  3. Refs are stable within a snapshot and replayed on the next command via element-state cache, so click @3 works without a new selector.
  4. Output is text first — no JSON wrapping unless you ask for --json. The agent reads what a human would read in DevTools.
  5. Background tab mode (--connect=auto) reuses an existing Chrome session in an isolated tab, so multiple agents can share one browser without colliding.

Architecture deep dive: docs/architecture.md. Full CLI reference: docs/cli.md. MCP server (11 tools): docs/mcp.md. Anti-bot story: docs/anti-bot.md. Fast HTTP path: docs/fast-path.md.


Comparison

ghostchrome Playwright-MCP Playwright (raw) Puppeteer chromedp
Target LLM agents LLM agents (MCP) Devs / QA Devs Devs (Go)
Runtime Go binary Node.js Node.js Node.js Go binary
Install size ~24 MB ~330 MB ~330 MB ~280 MB ~20 MB
Snapshot tokens (median) ~1,500 ~5,500 n/a (raw HTML) n/a n/a
Snapshot latency (warm) ~50 ms ~80 ms n/a n/a n/a
Multi-browser Chrome only Chrome / FF / WebKit Chrome / FF / WebKit Chrome / FF Chrome only
Refs for click/type @1, @2 aria-ref strings CSS / XPath CSS / XPath CSS / XPath
Auto-wait yes (4 conditions) yes yes (battle-tested) yes partial
Trace viewer format-compatible (planned) yes yes no no
Stealth built-in patches external plugin external plugin external plugin manual

Pick ghostchrome if you're piloting a browser from an LLM and tokens / latency / footprint matter. Pick Playwright if you're writing E2E test suites or need WebKit/Firefox parity.


Using it with LLM agents

ghostchrome exposes its 11 essential browser tools as an MCP stdio server (ghostchrome mcp) and as the regular CLI (allowlist ghostchrome for shell-tool agents). One binary, two surfaces, same engine.

Claude Code (Anthropic)

claude mcp add ghostchrome -- ghostchrome mcp --stealth

That's it. Claude Code will spawn ghostchrome mcp in stdio mode on demand and route the 11 tools to the model.

Codex (OpenAI)

codex mcp add ghostchrome -- ghostchrome mcp --stealth

MCP tool surface (v1.0)

Deliberately small — 11 tools, no fat. Each one was kept because it's on the hot path of a browser-driving loop.

Tool Purpose
snapshot Status + errors + network + DOM with refs — canonical first call
navigate Go to URL without snapshot
click Click @ref
type Type into @ref (submit:true to press Enter after)
select Pick option in <select> by @ref
press Send key (Enter, Tab, Escape, ArrowDown, ...)
wait_for Wait for selector / text / timeout
eval Run JS — escape hatch for anything else
screenshot WebP/JPEG/PNG of viewport, full page, or element
back / forward Browser history

Niche workflows (cookies, storage, tabs, viewport, network sniff/replay, tracing) live in the CLI only. Reach them via eval or shell out when needed.

Custom Python loop

import subprocess, json
def snapshot(url):
    r = subprocess.run(
        ["ghostchrome", "preview", url, "--connect=auto", "--json"],
        capture_output=True, text=True, check=True,
    )
    return json.loads(r.stdout)

Aider / Cursor / any agent with shell access

Use ghostchrome as a regular shell command. Prefix calls with --connect=auto after running ghostchrome serve once per session.

Recipes: docs/recipes/ — Algolia, AutoScout24, bulk scrape, registry sweep, agent JSONL mode.


Command reference

Click to expand the full command surface
Page inspection
  preview <url>                 Page health: status, errors, network, DOM
  navigate <url>                Navigate; optionally extract
  extract  <url>                Compact accessibility tree with refs
  screenshot <url>              PNG of viewport, full page, or element
  eval "<expr>" <url>           Run JS, await async, return value
  errors <url>                  Console + Log + network 4xx/5xx
  perf <url>                    Lighthouse-lite timing summary

Interaction (refs from the last snapshot)
  click @N <url>
  type @N "text" <url>
  select @N "option" <url>
  hover @N <url>
  press <key> [--on @N] <url>

Browser & session
  serve [--port N]              Long-lived Chrome; prints ws:// URL
  back / forward
  waitfor "selector" <url>
  import-profile / export-profile

Scraping & bulk
  batch <jsonl>                 Run agent ops from a JSONL file
  fastfetch <url>               HTML-only fast path, no JS render
  collect <url>                 Observer stream (NDJSON of net+console+page events)

Agents
  agent <jsonl>                 Drive the browser from a JSONL recipe
  mcp [--stdio]                 Run as an MCP server (stdio or socket)

Full details: docs/cli.md.


Status & roadmap

Stable — preview, navigate, extract, click/type/select/hover/press, errors, screenshot, eval, serve, --connect=auto, MCP server.

Experimental — stealth patches, agent JSONL mode, AI extractors. Tracked behind flags; APIs may change.

Not in scope (yet) — Firefox/WebKit support (would arrive via a playwright-core subprocess fallback, not native), GUI test runner, visual regression diff.

Versioning follows SemVer; see .claude/rules/versioning.md.


Contributing

PRs welcome. The codebase is small and laid out in engine/ (CDP logic) and cmd/ (one Cobra command per file). Run tests with go test ./.... Bench changes should include a re-run of ./benchmark/run-bench.sh so reviewers can verify the numbers don't regress.

When the CLI surface changes, mirror it in the sibling SDK at ../ghostchrome-sdk (Node, Python, PHP) — see CLAUDE.md.


License

MIT © 2026 MakFly.

About

Ultra-light browser automation CLI for LLM agents. Single Go binary, 3.65x fewer tokens than Playwright-MCP, 1.63x faster per snapshot. A modern Playwright alternative built for AI agents.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors