A lightweight MCP server that exports CAMEL framework's HybridBrowserToolkit as MCP-compatible tools.
This project provides an MCP (Model Control Protocol) interface for CAMEL's HybridBrowserToolkit, enabling browser automation capabilities through a standardized protocol. It allows LLM-based applications to control web browsers, navigate pages, interact with elements, and capture screenshots.
Key features:
- Full browser automation capabilities (click, type, navigate, etc.)
- Screenshot capture with visual element identification
- Multi-tab management
- JavaScript execution in browser console
- Async operation support
You can install the package directly from source:
git clone [email protected]:camel-ai/browser_agent.git
cd browser_agent
pip install -e .To use this MCP server with Claude Desktop, add it to your configuration file.
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"hybrid-browser": {
"command": "python",
"args": [
"-m",
"hybrid_browser_mcp.server"
]
}
}
}Make sure to:
- Use the correct path to your Python interpreter (you can find it with
which python) - Ensure the package is installed in that Python environment
- Restart Claude Desktop completely after updating the configuration
After restarting Claude Desktop:
- Click the 🔌 (plug icon) in the conversation interface
- You should see "hybrid-browser" listed among available tools
- The browser automation tools will be available (browser_open, browser_click, etc.)
Configuration Success Example:
claude_desktop_config.json with hybrid-browser MCP server configured
Browser Tools in Action:
Using browser automation tools in Claude Desktop to interact with web pages
The browser behavior is configured through hybrid_browser_mcp/config.py. You can modify this file to customize the browser settings:
BROWSER_CONFIG = {
"headless": False, # Run browser in headless mode
"stealth": True, # Enable stealth mode
"viewport_limit": False, # Include all elements in snapshots
"cache_dir": "tmp/", # Cache directory for screenshots
"enabled_tools": [ # List of enabled browser tools
"browser_open", "browser_close", "browser_visit_page",
"browser_back", "browser_forward", "browser_get_som_screenshot",
"browser_click", "browser_type", "browser_select",
"browser_scroll", "browser_enter", "browser_mouse_control",
"browser_mouse_drag", "browser_press_key", "browser_switch_tab",
# Uncomment to enable additional tools:
# "browser_get_page_snapshot",
# "browser_close_tab",
# "browser_console_view",
# "browser_console_exec",
],
}| Option | Description | Default | Type |
|---|---|---|---|
headless |
Run browser in headless mode (no window) | False |
bool |
stealth |
Enable stealth mode to avoid detection | False |
bool |
viewport_limit |
Only include elements in current viewport in snapshots | False |
bool |
cache_dir |
Directory for storing cache files | "tmp/" |
str |
enabled_tools |
List of enabled tools | None* |
list or None |
*When enabled_tools is None, these default tools are enabled: browser_open, browser_close, browser_visit_page, browser_back, browser_forward, browser_click, browser_type, browser_switch_tab
1. Headless mode for automation:
USER_BROWSER_CONFIG = {
"headless": True,
}2. Stealth mode with visible browser:
USER_BROWSER_CONFIG = {
"headless": False,
"stealth": True,
}3. Limited tools for safety:
USER_BROWSER_CONFIG = {
"enabled_tools": [
"browser_open",
"browser_visit_page",
"browser_get_page_snapshot",
"browser_close",
],
}4. Enable all available tools:
USER_BROWSER_CONFIG = {
"enabled_tools": [
"browser_open", "browser_close", "browser_visit_page",
"browser_back", "browser_forward", "browser_get_page_snapshot",
"browser_get_som_screenshot", "browser_click", "browser_type",
"browser_select", "browser_scroll", "browser_enter",
"browser_switch_tab", "browser_close_tab", "browser_get_tab_info",
"browser_mouse_control", "browser_mouse_drag", "browser_press_key",
"browser_wait_user", "browser_console_view", "browser_console_exec",
],
}The server exposes the following browser control tools:
browser_open(): Opens a new browser sessionbrowser_close(): Closes the browser sessionbrowser_visit_page(url): Navigates to a specific URLbrowser_back(): Goes back in browser historybrowser_forward(): Goes forward in browser history
browser_click(ref): Clicks on an element by its reference IDbrowser_type(ref, text, inputs): Types text into input fieldsbrowser_select(ref, value): Selects an option in a dropdownbrowser_scroll(direction, amount): Scrolls the pagebrowser_enter(): Presses the Enter keybrowser_press_key(keys): Presses specific keyboard keys
browser_get_page_snapshot(): Gets a textual snapshot of interactive elementsbrowser_get_som_screenshot(read_image, instruction): Captures a screenshot with element annotationslist_browser_functions(): Lists all available browser functions
browser_switch_tab(tab_id): Switches to a different browser tabbrowser_close_tab(tab_id): Closes a specific tabbrowser_get_tab_info(): Gets information about all open tabs
browser_console_view(): Views console logsbrowser_console_exec(code): Executes JavaScript in the browser consolebrowser_mouse_control(control, x, y): Controls mouse actions at coordinatesbrowser_mouse_drag(from_ref, to_ref): Drags elementsbrowser_wait_user(timeout_sec): Waits for user input
# Open browser and navigate
await browser_open()
await browser_visit_page("https://www.google.com")
# Get page snapshot to see available elements
snapshot = await browser_get_page_snapshot()
print(snapshot)
# Interact with elements
await browser_type(ref="search-input", text="CAMEL AI framework")
await browser_enter()
# Take a screenshot
await browser_get_som_screenshot()
# Close browser
await browser_close()The server works by:
- Wrapping CAMEL's HybridBrowserToolkit with async support
- Exposing toolkit methods as MCP-compatible tools
- Managing a singleton browser instance per session
- Handling WebSocket communication for real-time browser control
To set up a development environment:
pip install -e ".[dev]"Run tests:
pytest-
Check if the package is installed correctly:
# Should output the path to the executable which hybrid-browser-mcp -
Test the server manually:
hybrid-browser-mcp # Should start without errors # Press Ctrl+C to stop
-
Check Claude Desktop logs for errors:
# macOS tail -f ~/Library/Logs/Claude/mcp*.log # Windows Get-Content "$env:APPDATA\Claude\logs\mcp*.log" -Tail 20 -Wait
-
Verify the configuration file:
# macOS cat ~/Library/Application\ Support/Claude/claude_desktop_config.json # Windows type %APPDATA%\Claude\claude_desktop_config.json
Solution: Use the full Python path in your configuration:
{
"mcpServers": {
"hybrid-browser": {
"command": "/usr/bin/python3", // or your Python path
"args": ["-m", "hybrid_browser_mcp.server"]
}
}
}Solution: The HybridBrowserToolkit uses a TypeScript-based browser controller that runs on Node.js. It will automatically download and manage browser binaries. If you encounter issues:
- Ensure Node.js is installed on your system
- The TypeScript server will start automatically when needed
- Browser binaries will be downloaded on first use
To see detailed logs, you can run the server with debug output:
python -m hybrid_browser_mcp.server 2> debug.logThen check debug.log for any error messages.
This project is licensed under the MIT License - see the LICENSE file for details.