|
| 1 | +# Firecrawl |
| 2 | + |
| 3 | +The [Firecrawl MCP Server](https://github.com/firecrawl/firecrawl-mcp-server) |
| 4 | +connects your ADK agent to the [Firecrawl](https://www.firecrawl.dev/) API, a |
| 5 | +service that can crawl any website and convert its content into clean, |
| 6 | +structured markdown. This allows your agent to ingest, search, and reason over |
| 7 | +web data from any URL, including all its subpages. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **Agent-based Web Research**: Deploy an agent that can take a topic, use the |
| 12 | + search tool to find relevant URLs, and then use the scrape tool to extract the |
| 13 | + full content of each page for analysis or summarization. |
| 14 | + |
| 15 | +- **Structured Data Extraction**: Use the extract tool to pull specific, |
| 16 | + structured information (like product names, prices, or contact info) from a |
| 17 | + list of URLs, powered by LLM extraction. |
| 18 | + |
| 19 | +- **Large-Scale Content Ingestion**: Automate the scraping of entire websites or |
| 20 | + large batches of URLs using the batch scrape and crawl tools. This is ideal |
| 21 | + for populating a vector database for a RAG (Retrieval-Augmented Generation) |
| 22 | + pipeline. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +- [Sign up on Firecrawl](https://www.firecrawl.dev/signin) and [get an API key](https://firecrawl.dev/app/api-keys) |
| 27 | + |
| 28 | +## Usage with ADK |
| 29 | + |
| 30 | +=== "Local MCP Server" |
| 31 | + |
| 32 | + ```python |
| 33 | + from google.adk.agents.llm_agent import Agent |
| 34 | + from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams |
| 35 | + from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset |
| 36 | + from mcp import StdioServerParameters |
| 37 | + |
| 38 | + FIRECRAWL_API_KEY = "YOUR_FIRECRAWL_API_KEY" |
| 39 | + |
| 40 | + root_agent = Agent( |
| 41 | + model="gemini-2.5-pro", |
| 42 | + name="firecrawl_agent", |
| 43 | + description="A helpful assistant for scraping websites with Firecrawl", |
| 44 | + instruction="Help the user search for website content", |
| 45 | + tools=[ |
| 46 | + MCPToolset( |
| 47 | + connection_params=StdioConnectionParams( |
| 48 | + server_params = StdioServerParameters( |
| 49 | + command="npx", |
| 50 | + args=[ |
| 51 | + "-y", |
| 52 | + "firecrawl-mcp", |
| 53 | + ], |
| 54 | + env={ |
| 55 | + "FIRECRAWL_API_KEY": FIRECRAWL_API_KEY, |
| 56 | + } |
| 57 | + ), |
| 58 | + timeout=30, |
| 59 | + ), |
| 60 | + ) |
| 61 | + ], |
| 62 | + ) |
| 63 | + ``` |
| 64 | + |
| 65 | +=== "Remote MCP Server" |
| 66 | + |
| 67 | + ```python |
| 68 | + from google.adk.agents.llm_agent import Agent |
| 69 | + from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams |
| 70 | + from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset |
| 71 | + |
| 72 | + FIRECRAWL_API_KEY = "YOUR_FIRECRAWL_API_KEY" |
| 73 | + |
| 74 | + root_agent = Agent( |
| 75 | + model="gemini-2.5-pro", |
| 76 | + name="firecrawl_agent", |
| 77 | + description="A helpful assistant for scraping websites with Firecrawl", |
| 78 | + instruction="Help the user search for website content", |
| 79 | + tools=[ |
| 80 | + MCPToolset( |
| 81 | + connection_params=StreamableHTTPServerParams( |
| 82 | + url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp", |
| 83 | + ), |
| 84 | + ) |
| 85 | + ], |
| 86 | + ) |
| 87 | + ``` |
| 88 | + |
| 89 | +## Available tools |
| 90 | + |
| 91 | +This toolset provides a comprehensive suite of functions for web crawling, |
| 92 | +scraping, and searching: |
| 93 | + |
| 94 | +Tool | Name | Description |
| 95 | +---- | ---- | ----------- |
| 96 | +Scrape Tool | `firecrawl_scrape` | Scrape content from a single URL with advanced options |
| 97 | +Batch Scrape Tool | `firecrawl_batch_scrape` | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing |
| 98 | +Check Batch Status | `firecrawl_check_batch_status` | Check the status of a batch operation |
| 99 | +Map Tool | `firecrawl_map` | Map a website to discover all indexed URLs on the site |
| 100 | +Search Tool | `firecrawl_search` | Search the web and optionally extract content from search results |
| 101 | +Crawl Tool | `firecrawl_crawl` | Start an asynchronous crawl with advanced options |
| 102 | +Check Crawl Status | `firecrawl_check_crawl_status` | Check the status of a crawl job |
| 103 | +Extract Tool | `firecrawl_extract` | Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction |
| 104 | + |
| 105 | +## Configuration |
| 106 | + |
| 107 | +The Firecrawl MCP server can be configured using environment variables: |
| 108 | + |
| 109 | +**Required**: |
| 110 | + |
| 111 | +- `FIRECRAWL_API_KEY`: Your Firecrawl API key |
| 112 | + - Required when using cloud API (default) |
| 113 | + - Optional when using self-hosted instance with `FIRECRAWL_API_URL` |
| 114 | + |
| 115 | +**Firecrawl API URL (optional)**: |
| 116 | + |
| 117 | +- `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances |
| 118 | + - Example: `https://firecrawl.your-domain.com` |
| 119 | + - If not provided, the cloud API will be used (requires API key) |
| 120 | + |
| 121 | +**Retry configuration (optional)**: |
| 122 | + |
| 123 | +- `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3) |
| 124 | +- `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000) |
| 125 | +- `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000) |
| 126 | +- `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) |
| 127 | + |
| 128 | +**Credit usage monitoring (optional)**: |
| 129 | + |
| 130 | +- `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000) |
| 131 | +- `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100) |
| 132 | + |
| 133 | +## Additional resources |
| 134 | + |
| 135 | +- [Firecrawl MCP Server Documentation](https://docs.firecrawl.dev/mcp-server) |
| 136 | +- [Firecrawl MCP Server Repository](https://github.com/firecrawl/firecrawl-mcp-server) |
| 137 | +- [Firecrawl Use Cases](https://docs.firecrawl.dev/use-cases/overview) |
| 138 | +- [Firecrawl Advanced Scraping Guide](https://docs.firecrawl.dev/advanced-scraping-guide) |
0 commit comments