A lightweight MCP server that exports CAMEL framework's HybridBrowserToolkit as MCP-compatible tools.
This project provides an MCP (Model Control Protocol) interface for CAMEL's HybridBrowserToolkit, enabling browser automation capabilities through a standardized protocol. It allows LLM-based applications to control web browsers, navigate pages, interact with elements, and capture screenshots.
Key features:
- Full browser automation capabilities (click, type, navigate, etc.)
- Screenshot capture with visual element identification
- Multi-tab management
- JavaScript execution in browser console
- Async operation support
You can install the package directly from source:
git clone [email protected]:camel-ai/browser_agent.git
cd browser_agent
pip install -e .To use this MCP server with Claude Desktop, add it to your configuration file.
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%\Claude\claude_desktop_config.json
- Linux: ~/.config/Claude/claude_desktop_config.json
Add the following to your claude_desktop_config.json:
{
  "mcpServers": {
    "hybrid-browser": {
      "command": "python",
      "args": [
        "-m",
        "hybrid_browser_mcp.server"
      ]
    }
  }
}Make sure to:
- Use the correct path to your Python interpreter (you can find it with which python)
- Ensure the package is installed in that Python environment
- Restart Claude Desktop completely after updating the configuration
After restarting Claude Desktop:
- Click the 🔌 (plug icon) in the conversation interface
- You should see "hybrid-browser" listed among available tools
- The browser automation tools will be available (browser_open, browser_click, etc.)
Configuration Success Example:
 
claude_desktop_config.json with hybrid-browser MCP server configured
Browser Tools in Action:
 
Using browser automation tools in Claude Desktop to interact with web pages
The browser behavior is configured through hybrid_browser_mcp/config.py. You can modify this file to customize the browser settings:
BROWSER_CONFIG = {
    "headless": False,              # Run browser in headless mode
    "stealth": True,                # Enable stealth mode
    "viewport_limit": False,        # Include all elements in snapshots
    "cache_dir": "tmp/",           # Cache directory for screenshots
    "enabled_tools": [             # List of enabled browser tools
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_som_screenshot",
        "browser_click", "browser_type", "browser_select",
        "browser_scroll", "browser_enter", "browser_mouse_control",
        "browser_mouse_drag", "browser_press_key", "browser_switch_tab",
        # Uncomment to enable additional tools:
        # "browser_get_page_snapshot",
        # "browser_close_tab",
        # "browser_console_view",
        # "browser_console_exec",
    ],
}| Option | Description | Default | Type | 
|---|---|---|---|
| headless | Run browser in headless mode (no window) | False | bool | 
| stealth | Enable stealth mode to avoid detection | False | bool | 
| viewport_limit | Only include elements in current viewport in snapshots | False | bool | 
| cache_dir | Directory for storing cache files | "tmp/" | str | 
| enabled_tools | List of enabled tools | None* | listorNone | 
*When enabled_tools is None, these default tools are enabled: browser_open, browser_close, browser_visit_page, browser_back, browser_forward, browser_click, browser_type, browser_switch_tab
1. Headless mode for automation:
USER_BROWSER_CONFIG = {
    "headless": True,
}2. Stealth mode with visible browser:
USER_BROWSER_CONFIG = {
    "headless": False,
    "stealth": True,
}3. Limited tools for safety:
USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open",
        "browser_visit_page",
        "browser_get_page_snapshot",
        "browser_close",
    ],
}4. Enable all available tools:
USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_page_snapshot",
        "browser_get_som_screenshot", "browser_click", "browser_type",
        "browser_select", "browser_scroll", "browser_enter",
        "browser_switch_tab", "browser_close_tab", "browser_get_tab_info",
        "browser_mouse_control", "browser_mouse_drag", "browser_press_key",
        "browser_wait_user", "browser_console_view", "browser_console_exec",
    ],
}The server exposes the following browser control tools:
- browser_open(): Opens a new browser session
- browser_close(): Closes the browser session
- browser_visit_page(url): Navigates to a specific URL
- browser_back(): Goes back in browser history
- browser_forward(): Goes forward in browser history
- browser_click(ref): Clicks on an element by its reference ID
- browser_type(ref, text, inputs): Types text into input fields
- browser_select(ref, value): Selects an option in a dropdown
- browser_scroll(direction, amount): Scrolls the page
- browser_enter(): Presses the Enter key
- browser_press_key(keys): Presses specific keyboard keys
- browser_get_page_snapshot(): Gets a textual snapshot of interactive elements
- browser_get_som_screenshot(read_image, instruction): Captures a screenshot with element annotations
- list_browser_functions(): Lists all available browser functions
- browser_switch_tab(tab_id): Switches to a different browser tab
- browser_close_tab(tab_id): Closes a specific tab
- browser_get_tab_info(): Gets information about all open tabs
- browser_console_view(): Views console logs
- browser_console_exec(code): Executes JavaScript in the browser console
- browser_mouse_control(control, x, y): Controls mouse actions at coordinates
- browser_mouse_drag(from_ref, to_ref): Drags elements
- browser_wait_user(timeout_sec): Waits for user input
# Open browser and navigate
await browser_open()
await browser_visit_page("https://www.google.com")
# Get page snapshot to see available elements
snapshot = await browser_get_page_snapshot()
print(snapshot)
# Interact with elements
await browser_type(ref="search-input", text="CAMEL AI framework")
await browser_enter()
# Take a screenshot
await browser_get_som_screenshot()
# Close browser
await browser_close()The server works by:
- Wrapping CAMEL's HybridBrowserToolkit with async support
- Exposing toolkit methods as MCP-compatible tools
- Managing a singleton browser instance per session
- Handling WebSocket communication for real-time browser control
To set up a development environment:
pip install -e ".[dev]"Run tests:
pytest- 
Check if the package is installed correctly: # Should output the path to the executable which hybrid-browser-mcp
- 
Test the server manually: hybrid-browser-mcp # Should start without errors # Press Ctrl+C to stop 
- 
Check Claude Desktop logs for errors: # macOS tail -f ~/Library/Logs/Claude/mcp*.log # Windows Get-Content "$env:APPDATA\Claude\logs\mcp*.log" -Tail 20 -Wait 
- 
Verify the configuration file: # macOS cat ~/Library/Application\ Support/Claude/claude_desktop_config.json # Windows type %APPDATA%\Claude\claude_desktop_config.json 
Solution: Use the full Python path in your configuration:
{
  "mcpServers": {
    "hybrid-browser": {
      "command": "/usr/bin/python3",  // or your Python path
      "args": ["-m", "hybrid_browser_mcp.server"]
    }
  }
}Solution: The HybridBrowserToolkit uses a TypeScript-based browser controller that runs on Node.js. It will automatically download and manage browser binaries. If you encounter issues:
- Ensure Node.js is installed on your system
- The TypeScript server will start automatically when needed
- Browser binaries will be downloaded on first use
To see detailed logs, you can run the server with debug output:
python -m hybrid_browser_mcp.server 2> debug.logThen check debug.log for any error messages.
This project is licensed under the MIT License - see the LICENSE file for details.