diff --git a/README.md b/README.md index 8ede376c9..7360c15f8 100644 --- a/README.md +++ b/README.md @@ -1,1322 +1,92 @@ -# agent-browser - -Browser automation CLI for AI agents. Fast native Rust CLI. - -## Installation - -### Global Installation (recommended) - -Installs the native Rust binary: - -```bash -npm install -g agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### Project Installation (local dependency) - -For projects that want to pin the version in `package.json`: - -```bash -npm install agent-browser -agent-browser install -``` - -Then use via `package.json` scripts or by invoking `agent-browser` directly. - -### Homebrew (macOS) - -```bash -brew install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### Cargo (Rust) - -```bash -cargo install agent-browser -agent-browser install # Download Chrome from Chrome for Testing (first time only) -``` - -### From Source - -```bash -git clone https://github.com/vercel-labs/agent-browser -cd agent-browser -pnpm install -pnpm build -pnpm build:native # Requires Rust (https://rustup.rs) -pnpm link --global # Makes agent-browser available globally -agent-browser install -``` - -### Linux Dependencies - -On Linux, install system dependencies: - -```bash -agent-browser install --with-deps -``` - -### Updating - -Upgrade to the latest version: - -```bash -agent-browser upgrade -``` - -Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically. - -### Requirements - -- **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon. -- **Rust** - Only needed when building from source (see From Source above). - -## Quick Start - -```bash -agent-browser open example.com -agent-browser snapshot # Get accessibility tree with refs -agent-browser click @e2 # Click by ref from snapshot -agent-browser fill @e3 "test@example.com" # Fill by ref -agent-browser get text @e1 # Get text by ref -agent-browser screenshot page.png -agent-browser close -``` - -### Traditional Selectors (also supported) - -```bash -agent-browser click "#submit" -agent-browser fill "#email" "test@example.com" -agent-browser find role button click --name "Submit" -``` - -## Commands - -### Core Commands - -```bash -agent-browser open # Navigate to URL (aliases: goto, navigate) -agent-browser click # Click element (--new-tab to open in new tab) -agent-browser dblclick # Double-click element -agent-browser focus # Focus element -agent-browser type # Type into element -agent-browser fill # Clear and fill -agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) -agent-browser keyboard type # Type with real keystrokes (no selector, current focus) -agent-browser keyboard inserttext # Insert text without key events (no selector) -agent-browser keydown # Hold key down -agent-browser keyup # Release key -agent-browser hover # Hover element -agent-browser select # Select dropdown option -agent-browser check # Check checkbox -agent-browser uncheck # Uncheck checkbox -agent-browser scroll [px] # Scroll (up/down/left/right, --selector ) -agent-browser scrollintoview # Scroll element into view (alias: scrollinto) -agent-browser drag # Drag and drop -agent-browser upload # Upload files -agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) -agent-browser screenshot --annotate # Annotated screenshot with numbered element labels -agent-browser screenshot --screenshot-dir ./shots # Save to custom directory -agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 -agent-browser pdf # Save as PDF -agent-browser snapshot # Accessibility tree with refs (best for AI) -agent-browser eval # Run JavaScript (-b for base64, --stdin for piped input) -agent-browser connect # Connect to browser via CDP -agent-browser stream enable [--port ] # Start runtime WebSocket streaming -agent-browser stream status # Show runtime streaming state and bound port -agent-browser stream disable # Stop runtime WebSocket streaming -agent-browser close # Close browser (aliases: quit, exit) -agent-browser close --all # Close all active sessions -``` - -### Get Info - -```bash -agent-browser get text # Get text content -agent-browser get html # Get innerHTML -agent-browser get value # Get input value -agent-browser get attr # Get attribute -agent-browser get title # Get page title -agent-browser get url # Get current URL -agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging) -agent-browser get count # Count matching elements -agent-browser get box # Get bounding box -agent-browser get styles # Get computed styles -``` - -### Check State - -```bash -agent-browser is visible # Check if visible -agent-browser is enabled # Check if enabled -agent-browser is checked # Check if checked -``` - -### Find Elements (Semantic Locators) - -```bash -agent-browser find role [value] # By ARIA role -agent-browser find text # By text content -agent-browser find label