This Docker image provides @mdream/crawl with Playwright Chrome pre-installed for website crawling and llms.txt generation in containerized environments.
# Basic crawling
docker run harlanzw/mdream:latest https://example.com
# Interactive mode
docker run -it harlanzw/mdream:latest
# Show help
docker run harlanzw/mdream:latest --help- Docker Hub:
harlanzw/mdream:latest,harlanzw/mdream:v0.8.5 - GitHub Container Registry:
ghcr.io/harlan-zw/mdream:latest - Platform: Supports
linux/amd64
# Crawl a website with depth limit
docker run harlanzw/mdream:latest https://example.com --depth 2
# Crawl with exclusions and limits
docker run harlanzw/mdream:latest https://large-site.com \
--exclude "*/admin/*" --exclude "*/api/*" --max-pages 50
# Crawl using Playwright for JavaScript sites
docker run harlanzw/mdream:latest https://spa-site.com --driver playwrightTo save crawled content to your local machine:
# Mount output directory
docker run -v $(pwd)/output:/app/output harlanzw/mdream:latest \
https://example.com --output /app/outputdocker build -t mdream-local .
docker run mdream-local https://example.comThe Docker container is configured with ENTRYPOINT to act directly as the mdream-crawl command:
- All arguments passed to
docker runare forwarded tomdream-crawl - No need to specify command names - just pass your crawl options
- Clean, intuitive interface that feels like using the CLI directly
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1- Already set (browsers pre-installed)PLAYWRIGHT_BROWSERS_PATH=/ms-playwright- Browser locationDISPLAY=:99- Virtual display for headless browsing
The crawler generates these artifacts in your output directory:
llms.txt- Consolidated text file optimized for LLM consumptionllms-full.txt- Extended format with comprehensive metadatamd/- Individual Markdown files for each crawled page
Uses apify/actor-node-playwright-chrome:20 which includes:
- Node.js 20 with pnpm
- Playwright with Chrome browser pre-installed
- XVFB for headless browsing support
- Optimized for web crawling and automation