Hybrid Browser MCP

A lightweight MCP server that exports CAMEL framework's HybridBrowserToolkit as MCP-compatible tools.

Overview

This project provides an MCP (Model Control Protocol) interface for CAMEL's HybridBrowserToolkit, enabling browser automation capabilities through a standardized protocol. It allows LLM-based applications to control web browsers, navigate pages, interact with elements, and capture screenshots.

Key features:

Full browser automation capabilities (click, type, navigate, etc.)
Screenshot capture with visual element identification
Multi-tab management
JavaScript execution in browser console
Async operation support

Installation

You can install the package directly from source:

git clone [email protected]:camel-ai/browser_agent.git
cd browser_agent
pip install -e .

Claude Desktop Configuration

To use this MCP server with Claude Desktop, add it to your configuration file.

Configuration File Location

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Configuration

Add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "hybrid-browser": {
      "command": "python",
      "args": [
        "-m",
        "hybrid_browser_mcp.server"
      ]
    }
  }
}

Make sure to:

Use the correct path to your Python interpreter (you can find it with which python)
Ensure the package is installed in that Python environment
Restart Claude Desktop completely after updating the configuration

Verify Connection

After restarting Claude Desktop:

Click the 🔌 (plug icon) in the conversation interface
You should see "hybrid-browser" listed among available tools
The browser automation tools will be available (browser_open, browser_click, etc.)

Configuration Success Example:

claude_desktop_config.json with hybrid-browser MCP server configured

Browser Tools in Action:

Using browser automation tools in Claude Desktop to interact with web pages

Browser Configuration

The browser behavior is configured through hybrid_browser_mcp/config.py. You can modify this file to customize the browser settings:

BROWSER_CONFIG = {
    "headless": False,              # Run browser in headless mode
    "stealth": True,                # Enable stealth mode
    "viewport_limit": False,        # Include all elements in snapshots
    "cache_dir": "tmp/",           # Cache directory for screenshots
    "enabled_tools": [             # List of enabled browser tools
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_som_screenshot",
        "browser_click", "browser_type", "browser_select",
        "browser_scroll", "browser_enter", "browser_mouse_control",
        "browser_mouse_drag", "browser_press_key", "browser_switch_tab",
        # Uncomment to enable additional tools:
        # "browser_get_page_snapshot",
        # "browser_close_tab",
        # "browser_console_view",
        # "browser_console_exec",
    ],
}

Configuration Options

Option	Description	Default	Type
`headless`	Run browser in headless mode (no window)	`False`	`bool`
`stealth`	Enable stealth mode to avoid detection	`False`	`bool`
`viewport_limit`	Only include elements in current viewport in snapshots	`False`	`bool`
`cache_dir`	Directory for storing cache files	`"tmp/"`	`str`
`enabled_tools`	List of enabled tools	`None`*	`list` or `None`

*When enabled_tools is None, these default tools are enabled: browser_open, browser_close, browser_visit_page, browser_back, browser_forward, browser_click, browser_type, browser_switch_tab

Example Configurations

1. Headless mode for automation:

USER_BROWSER_CONFIG = {
    "headless": True,
}

2. Stealth mode with visible browser:

USER_BROWSER_CONFIG = {
    "headless": False,
    "stealth": True,
}

3. Limited tools for safety:

USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open",
        "browser_visit_page",
        "browser_get_page_snapshot",
        "browser_close",
    ],
}

4. Enable all available tools:

USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_page_snapshot",
        "browser_get_som_screenshot", "browser_click", "browser_type",
        "browser_select", "browser_scroll", "browser_enter",
        "browser_switch_tab", "browser_close_tab", "browser_get_tab_info",
        "browser_mouse_control", "browser_mouse_drag", "browser_press_key",
        "browser_wait_user", "browser_console_view", "browser_console_exec",
    ],
}

Available Tools

The server exposes the following browser control tools:

Core Navigation

browser_open(): Opens a new browser session
browser_close(): Closes the browser session
browser_visit_page(url): Navigates to a specific URL
browser_back(): Goes back in browser history
browser_forward(): Goes forward in browser history

Page Interaction

browser_click(ref): Clicks on an element by its reference ID
browser_type(ref, text, inputs): Types text into input fields
browser_select(ref, value): Selects an option in a dropdown
browser_scroll(direction, amount): Scrolls the page
browser_enter(): Presses the Enter key
browser_press_key(keys): Presses specific keyboard keys

Page Analysis

browser_get_page_snapshot(): Gets a textual snapshot of interactive elements
browser_get_som_screenshot(read_image, instruction): Captures a screenshot with element annotations
list_browser_functions(): Lists all available browser functions

Tab Management

browser_switch_tab(tab_id): Switches to a different browser tab
browser_close_tab(tab_id): Closes a specific tab
browser_get_tab_info(): Gets information about all open tabs

Advanced Features

browser_console_view(): Views console logs
browser_console_exec(code): Executes JavaScript in the browser console
browser_mouse_control(control, x, y): Controls mouse actions at coordinates
browser_mouse_drag(from_ref, to_ref): Drags elements
browser_wait_user(timeout_sec): Waits for user input

Example Usage

# Open browser and navigate
await browser_open()
await browser_visit_page("https://www.google.com")

# Get page snapshot to see available elements
snapshot = await browser_get_page_snapshot()
print(snapshot)

# Interact with elements
await browser_type(ref="search-input", text="CAMEL AI framework")
await browser_enter()

# Take a screenshot
await browser_get_som_screenshot()

# Close browser
await browser_close()

Architecture

The server works by:

Wrapping CAMEL's HybridBrowserToolkit with async support
Exposing toolkit methods as MCP-compatible tools
Managing a singleton browser instance per session
Handling WebSocket communication for real-time browser control

Development

To set up a development environment:

pip install -e ".[dev]"

Run tests:

pytest

Troubleshooting

Server Not Appearing in Claude Desktop

Check if the package is installed correctly:

# Should output the path to the executable
which hybrid-browser-mcp

Test the server manually:

hybrid-browser-mcp
# Should start without errors
# Press Ctrl+C to stop

Check Claude Desktop logs for errors:

# macOS
tail -f ~/Library/Logs/Claude/mcp*.log

# Windows
Get-Content "$env:APPDATA\Claude\logs\mcp*.log" -Tail 20 -Wait

Verify the configuration file:

# macOS
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json

# Windows
type %APPDATA%\Claude\claude_desktop_config.json

Common Issues

Issue: "Command not found" error

Solution: Use the full Python path in your configuration:

{
  "mcpServers": {
    "hybrid-browser": {
      "command": "/usr/bin/python3",  // or your Python path
      "args": ["-m", "hybrid_browser_mcp.server"]
    }
  }
}

Issue: Browser doesn't open or shows errors

Solution: The HybridBrowserToolkit uses a TypeScript-based browser controller that runs on Node.js. It will automatically download and manage browser binaries. If you encounter issues:

Ensure Node.js is installed on your system
The TypeScript server will start automatically when needed
Browser binaries will be downloaded on first use

Debug Mode

To see detailed logs, you can run the server with debug output:

python -m hybrid_browser_mcp.server 2> debug.log

Then check debug.log for any error messages.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
hybrid_browser_mcp		hybrid_browser_mcp
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
claude_config_example.json		claude_config_example.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hybrid Browser MCP

Overview

Installation

Claude Desktop Configuration

Configuration File Location

Configuration

Verify Connection

Browser Configuration

Configuration Options

Example Configurations

Available Tools

Core Navigation

Page Interaction

Page Analysis

Tab Management

Advanced Features

Example Usage

Architecture

Development

Troubleshooting

Server Not Appearing in Claude Desktop

Common Issues

Issue: "Command not found" error

Issue: Browser doesn't open or shows errors

Debug Mode

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

camel-ai/browser_agent

Folders and files

Latest commit

History

Repository files navigation

Hybrid Browser MCP

Overview

Installation

Claude Desktop Configuration

Configuration File Location

Configuration

Verify Connection

Browser Configuration

Configuration Options

Example Configurations

Available Tools

Core Navigation

Page Interaction

Page Analysis

Tab Management

Advanced Features

Example Usage

Architecture

Development

Troubleshooting

Server Not Appearing in Claude Desktop

Common Issues

Issue: "Command not found" error

Issue: Browser doesn't open or shows errors

Debug Mode

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages