☁️ The fastest HTML to markdown converter on GitHub. Optimized for LLMs and supports streaming.
Tip
🎉 Mdream v1 is here! Read the v1 release notes.
|
Made possible by my Sponsor Program 💖 Follow me @harlan_zw 🐦 • Join Discord for help |
- 🧠 #1 Token Optimizer: Up to 2x fewer tokens than Turndown, node-html-markdown, and html-to-markdown. 70-99% fewer tokens than raw HTML.
- 🚀 #1 Fastest: Fastest pure JS & native Rust converter. Up to 37x faster than Turndown (Rust NAPI vs JS), 4.6x, 5x faster than htmd (Rust vs Rust). Converts 1.8MB HTML in ~5.2ms (Rust).
- 🔍 Generates Minimal GitHub Flavored Markdown: Frontmatter, Nested & HTML markup support.
- 🌊 Streamable: Memory efficient streaming for large documents and real-time pipelines.
- ⚡ Tiny: 10kB gzip JS core, 60kB gzip with Rust WASM engine. Zero dependencies.
- ⚙️ Run anywhere: CLI Crawler, Docker, GitHub Actions, Vite, & more.
A zero-dependency, LLM-optimized HTML to Markdown converter. Faster and leaner than Turndown, node-html-markdown, and html-to-markdown, with output tuned for token efficiency and readability.
On top of the core converter, Mdream ships packages to generate LLM artifacts like llms.txt for your own sites or produce LLM context for any project.
Mdream is built to run anywhere for all projects and use cases and is available in the following packages:
| Package | Description |
|---|---|
| Rust NAPI engine + WASM for edge. Performance-first, declarative config. Includes CLI. | |
Pure JS engine. Full hook access, zero native deps. Subpaths: /plugins, /splitter, /parse, /llms-txt, /negotiate. |
|
Site-wide crawler to generate llms.txt artifacts from entire websites |
|
| Pre-built Docker image with Playwright Chrome for containerized website crawling | |
Generate automatic .md for your own Vite sites |
|
Generate automatic .md and llms.txt artifacts generation for Nuxt Sites |
|
Generate .md and llms.txt artifacts from your static .html output |
|
| Native Rust crate with CLI. Zero dependencies, streaming support. Available on crates.io | |
| Use mdream directly in browsers via unpkg/jsDelivr without any build step |
📥 URL to Markdown
Fetches the Markdown Wikipedia page and converts it to Markdown preserving the original links and images.
curl -s https://en.wikipedia.org/wiki/Markdown \
| npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
| tee streaming.mdTip: The --origin flag will fix relative image and link paths
Want to make it look nice? Use glow.
curl -s https://en.wikipedia.org/wiki/Markdown \
| npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
| glow📄 Local HTML to Markdown
Converts a local HTML file to a Markdown file, using tee to write the output to a file and display it in the terminal.
cat index.html \
| npx mdream@beta --preset minimal \
| tee streaming.mdWant to make it look nice? Use glow.
cat index.html \
| npx mdream@beta --preset minimal \
| glow🧠 Feed Any Website to an LLM
Pipe web content straight into Claude, GPT, or any LLM CLI:
# Single page → Claude
curl -s https://react.dev/learn | npx mdream@beta --origin https://react.dev --preset minimal \
| claude -p "explain the key concepts on this page"
# Crawl entire docs → summarize
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
cat output/llms-full.txt | claude -p "write a getting started guide from these docs"
# Compare two frameworks
diff <(curl -s https://vuejs.org/guide/introduction | npx mdream@beta --preset minimal) \
<(curl -s https://react.dev/learn | npx mdream@beta --preset minimal) \
| claude -p "compare these two frameworks based on their intro docs"
# JavaScript/SPA sites (React, Vue, Angular)
npx @mdream/crawl@beta https://spa-site.com --driver playwright
cat output/llms-full.txt | claude -p "what features does this app have"🌐 Make Your Site AI-Discoverable
Generate llms.txt to help AI tools understand your site:
# Static sites
npx @mdream/crawl@beta https://yoursite.com
# JavaScript/SPA sites (React, Vue, Angular)
npx -p playwright -p @mdream/crawl@beta crawl https://spa-site.com --driver playwrightOutputs:
output/llms.txt- Optimized for LLM consumptionoutput/llms-full.txt- Complete content with metadataoutput/md/- Individual markdown files per page
🗄️ Build RAG Systems from Websites
Crawl websites and generate embeddings for vector databases:
import { crawlAndGenerate } from '@mdream/crawl'
import { withMinimalPreset } from '@mdream/js/preset/minimal'
import { htmlToMarkdownSplitChunks } from '@mdream/js/splitter'
import { embed } from 'ai'
const { createTransformersJS } = await import('@built-in-ai/transformers-js')
const embeddingModel = createTransformersJS().textEmbeddingModel('Xenova/bge-base-en-v1.5')
const embeddings = []
await crawlAndGenerate({
urls: ['https://example.com'],
onPage: async ({ url, html, title, origin }) => {
const chunks = htmlToMarkdownSplitChunks(html, withMinimalPreset({
chunkSize: 1000,
chunkOverlap: 200,
origin,
}))
for (const chunk of chunks) {
const { embedding } = await embed({ model: embeddingModel, value: chunk.content })
embeddings.push({ url, title, content: chunk.content, embedding })
}
},
})
// Save to vector database: await saveToVectorDB(embeddings)✂️ Extract Specific Content from Pages
Pull headers, images, or other elements during conversion:
import { htmlToMarkdown } from 'mdream'
const headers = []
const images = []
htmlToMarkdown(html, {
extraction: {
'h1, h2, h3': el => headers.push(el.textContent),
'img[src]': el => images.push({ src: el.attributes.src, alt: el.attributes.alt }),
},
})⚡ Optimize Token Usage With Clean Mode
Use clean: true (enabled by default with minimal: true) to automatically reduce token costs:
import { htmlToMarkdown } from 'mdream'
// All clean features enabled
htmlToMarkdown(html, { clean: true })
// Or selective features
htmlToMarkdown(html, {
clean: {
emptyLinks: true, // Strip #, javascript: links
emptyLinkText: true, // Drop [](url) links with no text
emptyImages: true, // Strip  with no alt text
redundantLinks: true, // [url](url) → url
selfLinkHeadings: true, // ## [Title](#title) → ## Title
fragments: true, // Strip broken #anchor links
urls: true, // Strip utm_*, fbclid tracking params
}
})pnpm add mdream@betaThe mdream package uses native Node.js bindings (NAPI-RS) which cannot be statically bundled. If your bundler fails to resolve mdream, mark it as external:
Next.js / Turbopack:
// next.config.js
const nextConfig = {
serverExternalPackages: ['mdream'],
}Webpack / other bundlers:
externals: ['mdream']Tip
@mdream/js has zero native dependencies and works with all bundlers without configuration.
Tip
Using Vite? @mdream/vite handles this automatically.
import { htmlToMarkdown } from 'mdream'
// Rust NAPI engine in Node.js, WASM in edge/browser runtimes
const markdown = htmlToMarkdown('<h1>Hello World</h1>')
console.log(markdown) // # Hello Worldimport { streamHtmlToMarkdown } from 'mdream'
const response = await fetch('https://en.wikipedia.org/wiki/Markdown')
for await (const chunk of streamHtmlToMarkdown(response.body, {
origin: 'https://en.wikipedia.org',
minimal: true,
})) {
process.stdout.write(chunk)
}See the mdream docs for complete details.
Need something that works in the browser or an edge runtime? Use Mdream.
The @mdream/crawl package crawls an entire site generating LLM artifacts using mdream for Markdown conversion.
- llms.txt: A consolidated text file optimized for LLM consumption.
- llms-full.txt: An extended format with comprehensive metadata and full content.
- Individual Markdown Files: Each crawled page is saved as a separate Markdown file in the
md/directory.
# Interactive
npx @mdream/crawl@beta
# Simple
npx @mdream/crawl@beta https://harlanzw.com
# Glob patterns
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
# Get help
npx @mdream/crawl@beta -hRun @mdream/crawl with Playwright Chrome pre-installed for website crawling in containerized environments.
# Quick start
docker run harlanzw/mdream:latest site.com/docs/**
# Interactive mode
docker run -it harlanzw/mdream:latest
# Using Playwright for JavaScript sites
docker run harlanzw/mdream:latest spa-site.com --driver playwrightAvailable Images:
harlanzw/mdream:latest- Latest stable releaseghcr.io/harlan-zw/mdream:latest- GitHub Container Registry
See DOCKER.md for complete usage, configuration, and building instructions.
pnpm add @mdream/action@betaSee the GitHub Actions README for usage and configuration.
pnpm install @mdream/vite@betaSee the Vite README for usage and configuration.
pnpm add @mdream/nuxt@betaSee the Nuxt Module README for usage and configuration.
Use mdream directly via CDN with no build step. Call init() once to load the WASM binary, then use htmlToMarkdown() synchronously:
<script src="https://unpkg.com/mdream/dist/iife.js"></script>
<script>
await window.mdream.init()
const markdown = window.mdream.htmlToMarkdown('<h1>Hello</h1><p>World</p>')
console.log(markdown) // # Hello\n\nWorld
</script>CDN Options:
- unpkg:
https://unpkg.com/mdream/dist/iife.js - jsDelivr:
https://cdn.jsdelivr.net/npm/mdream/dist/iife.js
Pure JS comparison. mdream uses no plugins, Turndown uses GFM plugin for equivalent table/strikethrough support.
| Input | mdream | Turndown | node-html-markdown | rehype-remark |
|---|---|---|---|---|
| 166 KB | 3.26ms | 11.26ms (3.5x) | 14.31ms (4.4x) | 35.19ms (10.8x) |
| 420 KB | 6.38ms | 13.63ms (2.1x) | 17.11ms (2.7x) | 62.10ms (9.7x) |
| 1.8 MB | 57.2ms | 264.3ms (4.6x) | 26,072ms (456x) | 826.7ms (14.5x) |
All crates compiled with opt-level=3, LTO, and single codegen unit.
| Input | mdream | htmd | html2md | html2md-rs | mdka | html_to_markdown |
|---|---|---|---|---|---|---|
| 166 KB | 0.34ms | 2.13ms (6.3x) | 2.71ms (8.0x) | panicked | 2.65ms (7.8x) | 1.72ms (5.1x) |
| 420 KB | 0.41ms | 3.50ms (8.6x) | 4.25ms (10.4x) | 1.54ms (3.8x) | 3.56ms (8.7x) | 2.72ms (6.7x) |
| 1.8 MB | 5.20ms | 34.4ms (6.6x) | >30s | 35.5ms (6.8x) | 37.6ms (7.2x) | 28.5ms (5.5x) |
For Node.js apps that need native speed. Includes N-API overhead.
| Input | mdream (rust) | html-to-markdown (rust) |
|---|---|---|
| 166 KB | 0.52ms | 3.94ms (7.6x) |
| 420 KB | 0.76ms | 7.48ms (9.8x) |
| 1.8 MB | 7.14ms | 82.9ms (11.6x) |
End-to-end cat file | tool > /dev/null via hyperfine. Includes process startup overhead (~20ms for Node.js, ~1ms for Go/Rust).
| Input | mdream (Rust) | mdream (Node.js) | html2markdown (Go) |
|---|---|---|---|
| 166 KB | 1.4ms | 26.9ms | 4.9ms |
| 420 KB | 2.1ms | 24.3ms | 5.6ms |
| 1.8 MB | 10.1ms | 34.8ms | 75.2ms (7.5x) |
mdream's Rust CLI is 2.6-7.5x faster than Go html2markdown. On the 1.8MB file, even the Node.js CLI (with ~20ms startup tax) beats Go by 2.2x. For raw conversion speed without startup overhead, see the JS and Rust tables above.
mdream is the only JavaScript HTML-to-markdown converter with streaming support. In the Go ecosystem, JohannesKaufmann/html-to-markdown supports streaming via io.Reader. No other JS, Rust, or Python converter supports streaming HTML input.
With minimal: true, mdream produces up to 92% fewer tokens than raw HTML and up to 2x fewer tokens than competing libraries.
| Page (HTML tokens) | mdream minimal | Turndown | node-html-markdown |
|---|---|---|---|
| Wikipedia (21K) | 6,101 (-71%) | 10,435 (-50%) | 10,176 (-52%) |
| GitHub Docs (62K) | 5,006 (-92%) | 43,983 (-30%) | 8,758 (-86%) |
| Wikipedia XL (194K) | 152,425 (-21%) | 195,978 (+1%) | 283,136 (+46%) |
Benchmarks run on real-world HTML using Vitest bench. See full methodology and reproduction steps.
- ultrahtml: HTML parsing inspiration
Licensed under the MIT license.
