Skip to content

Latest commit

 

History

History
202 lines (141 loc) · 4.9 KB

File metadata and controls

202 lines (141 loc) · 4.9 KB

Configuration Guide

Complete configuration reference for Scraper MCP.

Environment Variables

Server Settings

Variable Default Description
TRANSPORT streamable-http Transport type: streamable-http or sse
HOST 0.0.0.0 Host to bind to
PORT 8000 Port to bind to
CACHE_DIR /app/cache Cache directory path
ENABLE_CACHE_TOOLS false Expose cache management tools
DISABLE_RESOURCES false Disable MCP resources
DISABLE_PROMPTS false Disable MCP prompts

Feature Flags

Resources and prompts are enabled by default. To reduce context overhead for LLM clients:

# Via environment variables
DISABLE_RESOURCES=true
DISABLE_PROMPTS=true

# Via CLI flags
python -m scraper_mcp --disable-resources
python -m scraper_mcp --disable-prompts
python -m scraper_mcp --disable-resources --disable-prompts

Perplexity AI

Variable Default Description
PERPLEXITY_API_KEY - API key (enables AI tools when set)
PERPLEXITY_MODEL sonar Default model
PERPLEXITY_TEMPERATURE 0.3 Default temperature
PERPLEXITY_MAX_TOKENS 4000 Default max tokens

Get your API key from Perplexity AI.


Proxy Configuration

The scraper supports HTTP/HTTPS proxies through standard environment variables.

Basic Setup

Create a .env file:

# HTTP proxy
HTTP_PROXY=http://proxy.example.com:8080
http_proxy=http://proxy.example.com:8080

# HTTPS proxy
HTTPS_PROXY=http://proxy.example.com:8080
https_proxy=http://proxy.example.com:8080

# Bypass proxy for specific hosts
NO_PROXY=localhost,127.0.0.1,.local
no_proxy=localhost,127.0.0.1,.local

With Docker Compose

Docker Compose automatically reads .env files:

docker-compose up -d

With Docker Run

docker run -p 8000:8000 \
  -e HTTP_PROXY=http://proxy.example.com:8080 \
  -e HTTPS_PROXY=http://proxy.example.com:8080 \
  -e NO_PROXY=localhost,127.0.0.1,.local \
  cotdp/scraper-mcp:latest

Proxy with Authentication

HTTP_PROXY=http://username:password@proxy.example.com:8080
HTTPS_PROXY=http://username:password@proxy.example.com:8080

Build-Time vs Runtime

Proxy configuration works at two stages:

  1. Build time: Package installation (apt, uv, pip)
  2. Runtime: HTTP requests from the scraper

Both uppercase and lowercase variable names are supported.


ScrapeOps Proxy Integration

ScrapeOps is a premium proxy service for bypassing anti-bot measures, rendering JavaScript, and geo-targeting.

Features

  • JavaScript rendering for SPAs
  • Residential proxies (less likely to be blocked)
  • Geo-targeting by country
  • Automatic header rotation
  • High success rate with smart retries

Quick Setup

Add your API key to .env:

SCRAPEOPS_API_KEY=your_api_key_here

All requests automatically route through ScrapeOps when enabled.

Configuration Options

Variable Default Description
SCRAPEOPS_API_KEY - API key (required to enable)
SCRAPEOPS_RENDER_JS false Enable JavaScript rendering
SCRAPEOPS_RESIDENTIAL false Use residential proxies
SCRAPEOPS_COUNTRY - Target country (us, gb, de, etc.)
SCRAPEOPS_KEEP_HEADERS false Keep original headers
SCRAPEOPS_DEVICE desktop Device type: desktop, mobile, tablet

Full Example

# .env file
SCRAPEOPS_API_KEY=your_api_key_here
SCRAPEOPS_RENDER_JS=true
SCRAPEOPS_RESIDENTIAL=true
SCRAPEOPS_COUNTRY=us
SCRAPEOPS_DEVICE=desktop

Runtime Configuration

The dashboard Config tab allows runtime adjustments without restart.

Available Settings

Setting Default Description
Concurrency 8 Max parallel requests (1-50)
Default Timeout 30 Request timeout in seconds
Max Retries 3 Retry attempts on failure
Cache TTL - Default 3600 Default cache duration (1 hour)
Cache TTL - Realtime 300 API/live data cache (5 minutes)
Cache TTL - Static 86400 Static content cache (24 hours)

Important Notes

  • Changes apply immediately without restart
  • Settings reset when the server restarts
  • Use .env file for permanent configuration

DNS Rebinding Protection

MCP 1.24+ enables DNS rebinding protection by default.

Default Allowed Hosts

127.0.0.1:*
localhost:*
[::1]:*

Adding Custom Hosts

For external access (e.g., OrbStack, reverse proxy):

# In .env file
FASTMCP_ALLOWED_HOSTS=["127.0.0.1:*","localhost:*","[::1]:*","your-domain.local","your-domain.local:*"]
FASTMCP_ALLOWED_ORIGINS=["http://127.0.0.1:*","http://localhost:*","http://[::1]:*","https://your-domain.local"]

Disabling Protection

Not recommended for production:

FASTMCP_TRANSPORT_SECURITY__ENABLE_DNS_REBINDING_PROTECTION=false