mdream

☁️ The fastest HTML to markdown converter on GitHub. Optimized for LLMs and supports streaming.

Tip

🎉 Mdream v1 is here! Read the v1 release notes.

_{Made possible by my Sponsor Program 💖
Follow me @harlan_zw 🐦 • Join Discord for help}

Features

🧠 #1 Token Optimizer: Up to 2x fewer tokens than Turndown, node-html-markdown, and html-to-markdown. 70-99% fewer tokens than raw HTML.
🚀 #1 Fastest: Fastest pure JS & native Rust converter. Up to 37x faster than Turndown (Rust NAPI vs JS), 4.6x, 5x faster than htmd (Rust vs Rust). Converts 1.8MB HTML in ~5.2ms (Rust).
🔍 Generates Minimal GitHub Flavored Markdown: Frontmatter, Nested & HTML markup support.
🌊 Streamable: Memory efficient streaming for large documents and real-time pipelines.
⚡ Tiny: 10kB gzip JS core, 60kB gzip with Rust WASM engine. Zero dependencies.
⚙️ Run anywhere: CLI Crawler, Docker, GitHub Actions, Vite, & more.

What is Mdream?

A zero-dependency, LLM-optimized HTML to Markdown converter. Faster and leaner than Turndown, node-html-markdown, and html-to-markdown, with output tuned for token efficiency and readability.

On top of the core converter, Mdream ships packages to generate LLM artifacts like llms.txt for your own sites or produce LLM context for any project.

Mdream Packages

Mdream is built to run anywhere for all projects and use cases and is available in the following packages:

Package	Description
mdream	Rust NAPI engine + WASM for edge. Performance-first, declarative config. Includes CLI.
@mdream/js	Pure JS engine. Full hook access, zero native deps. Subpaths: `/plugins`, `/splitter`, `/parse`, `/llms-txt`, `/negotiate`.
@mdream/crawl	Site-wide crawler to generate `llms.txt` artifacts from entire websites
Docker	Pre-built Docker image with Playwright Chrome for containerized website crawling
@mdream/vite	Generate automatic `.md` for your own Vite sites
@mdream/nuxt	Generate automatic `.md` and `llms.txt` artifacts generation for Nuxt Sites
@mdream/action	Generate `.md` and `llms.txt` artifacts from your static `.html` output
mdream (crate)	Native Rust crate with CLI. Zero dependencies, streaming support. Available on crates.io
Browser CDN	Use mdream directly in browsers via unpkg/jsDelivr without any build step

What can Mdream do?

📥 URL to Markdown

Fetches the Markdown Wikipedia page and converts it to Markdown preserving the original links and images.

curl -s https://en.wikipedia.org/wiki/Markdown \
 | npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
  | tee streaming.md

Tip: The --origin flag will fix relative image and link paths

Want to make it look nice? Use glow.

curl -s https://en.wikipedia.org/wiki/Markdown \
 | npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
   | glow

📄 Local HTML to Markdown

Converts a local HTML file to a Markdown file, using tee to write the output to a file and display it in the terminal.

cat index.html \
 | npx mdream@beta --preset minimal \
  | tee streaming.md

Want to make it look nice? Use glow.

cat index.html \
 | npx mdream@beta --preset minimal \
  | glow

🧠 Feed Any Website to an LLM

Pipe web content straight into Claude, GPT, or any LLM CLI:

# Single page → Claude
curl -s https://react.dev/learn | npx mdream@beta --origin https://react.dev --preset minimal \
  | claude -p "explain the key concepts on this page"

# Crawl entire docs → summarize
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
cat output/llms-full.txt | claude -p "write a getting started guide from these docs"

# Compare two frameworks
diff <(curl -s https://vuejs.org/guide/introduction | npx mdream@beta --preset minimal) \
     <(curl -s https://react.dev/learn | npx mdream@beta --preset minimal) \
  | claude -p "compare these two frameworks based on their intro docs"

# JavaScript/SPA sites (React, Vue, Angular)
npx @mdream/crawl@beta https://spa-site.com --driver playwright
cat output/llms-full.txt | claude -p "what features does this app have"

🌐 Make Your Site AI-Discoverable

Generate llms.txt to help AI tools understand your site:

# Static sites
npx @mdream/crawl@beta https://yoursite.com

# JavaScript/SPA sites (React, Vue, Angular)
npx -p playwright -p @mdream/crawl@beta crawl https://spa-site.com --driver playwright

Outputs:

output/llms.txt - Optimized for LLM consumption
output/llms-full.txt - Complete content with metadata
output/md/ - Individual markdown files per page

🗄️ Build RAG Systems from Websites

Crawl websites and generate embeddings for vector databases:

import { crawlAndGenerate } from '@mdream/crawl'
import { withMinimalPreset } from '@mdream/js/preset/minimal'
import { htmlToMarkdownSplitChunks } from '@mdream/js/splitter'
import { embed } from 'ai'

const { createTransformersJS } = await import('@built-in-ai/transformers-js')
const embeddingModel = createTransformersJS().textEmbeddingModel('Xenova/bge-base-en-v1.5')

const embeddings = []

await crawlAndGenerate({
  urls: ['https://example.com'],
  onPage: async ({ url, html, title, origin }) => {
    const chunks = htmlToMarkdownSplitChunks(html, withMinimalPreset({
      chunkSize: 1000,
      chunkOverlap: 200,
      origin,
    }))

    for (const chunk of chunks) {
      const { embedding } = await embed({ model: embeddingModel, value: chunk.content })
      embeddings.push({ url, title, content: chunk.content, embedding })
    }
  },
})

// Save to vector database: await saveToVectorDB(embeddings)

✂️ Extract Specific Content from Pages

Pull headers, images, or other elements during conversion:

import { htmlToMarkdown } from 'mdream'

const headers = []
const images = []

htmlToMarkdown(html, {
  extraction: {
    'h1, h2, h3': el => headers.push(el.textContent),
    'img[src]': el => images.push({ src: el.attributes.src, alt: el.attributes.alt }),
  },
})

⚡ Optimize Token Usage With Clean Mode

Use clean: true (enabled by default with minimal: true) to automatically reduce token costs:

import { htmlToMarkdown } from 'mdream'

// All clean features enabled
htmlToMarkdown(html, { clean: true })

// Or selective features
htmlToMarkdown(html, {
  clean: {
    emptyLinks: true, // Strip #, javascript: links
    emptyLinkText: true, // Drop [](url) links with no text
    emptyImages: true, // Strip ![](url) with no alt text
    redundantLinks: true, // [url](url) → url
    selfLinkHeadings: true, // ## [Title](#title) → ## Title
    fragments: true, // Strip broken #anchor links
    urls: true, // Strip utm_*, fbclid tracking params
  }
})

Mdream Usage

Installation

pnpm add mdream@beta

Tip

Generate an Agent Skill for this package using skilld:

npx skilld add mdream

Bundler Compatibility

The mdream package uses native Node.js bindings (NAPI-RS) which cannot be statically bundled. If your bundler fails to resolve mdream, mark it as external:

Next.js / Turbopack:

// next.config.js
const nextConfig = {
  serverExternalPackages: ['mdream'],
}

Webpack / other bundlers:

externals: ['mdream']

Tip

@mdream/js has zero native dependencies and works with all bundlers without configuration.

Tip

Using Vite? @mdream/vite handles this automatically.

Basic Usage

import { htmlToMarkdown } from 'mdream'

// Rust NAPI engine in Node.js, WASM in edge/browser runtimes
const markdown = htmlToMarkdown('<h1>Hello World</h1>')
console.log(markdown) // # Hello World

import { streamHtmlToMarkdown } from 'mdream'

const response = await fetch('https://en.wikipedia.org/wiki/Markdown')
for await (const chunk of streamHtmlToMarkdown(response.body, {
  origin: 'https://en.wikipedia.org',
  minimal: true,
})) {
  process.stdout.write(chunk)
}

See the mdream docs for complete details.

Mdream Crawl

Need something that works in the browser or an edge runtime? Use Mdream.

The @mdream/crawl package crawls an entire site generating LLM artifacts using mdream for Markdown conversion.

llms.txt: A consolidated text file optimized for LLM consumption.
llms-full.txt: An extended format with comprehensive metadata and full content.
Individual Markdown Files: Each crawled page is saved as a separate Markdown file in the md/ directory.

Usage

# Interactive
npx @mdream/crawl@beta
# Simple
npx @mdream/crawl@beta https://harlanzw.com
# Glob patterns
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
# Get help
npx @mdream/crawl@beta -h

Docker

Run @mdream/crawl with Playwright Chrome pre-installed for website crawling in containerized environments.

# Quick start
docker run harlanzw/mdream:latest site.com/docs/**

# Interactive mode
docker run -it harlanzw/mdream:latest

# Using Playwright for JavaScript sites
docker run harlanzw/mdream:latest spa-site.com --driver playwright

Available Images:

harlanzw/mdream:latest - Latest stable release
ghcr.io/harlan-zw/mdream:latest - GitHub Container Registry

See DOCKER.md for complete usage, configuration, and building instructions.

GitHub Actions Integration

Installation

pnpm add @mdream/action@beta

See the GitHub Actions README for usage and configuration.

Vite Integration

Installation

pnpm install @mdream/vite@beta

See the Vite README for usage and configuration.

Nuxt Integration

Installation

pnpm add @mdream/nuxt@beta

See the Nuxt Module README for usage and configuration.

Browser CDN Usage

Use mdream directly via CDN with no build step. Call init() once to load the WASM binary, then use htmlToMarkdown() synchronously:

<script src="https://unpkg.com/mdream/dist/iife.js"></script>
<script>
  await window.mdream.init()
  const markdown = window.mdream.htmlToMarkdown('<h1>Hello</h1><p>World</p>')
  console.log(markdown) // # Hello\n\nWorld
</script>

CDN Options:

unpkg: https://unpkg.com/mdream/dist/iife.js
jsDelivr: https://cdn.jsdelivr.net/npm/mdream/dist/iife.js

Benchmarks

JavaScript (Node.js)

Pure JS comparison. mdream uses no plugins, Turndown uses GFM plugin for equivalent table/strikethrough support.

Input	mdream	Turndown	node-html-markdown	rehype-remark
166 KB	3.26ms	11.26ms (3.5x)	14.31ms (4.4x)	35.19ms (10.8x)
420 KB	6.38ms	13.63ms (2.1x)	17.11ms (2.7x)	62.10ms (9.7x)
1.8 MB	57.2ms	264.3ms (4.6x)	26,072ms (456x)	826.7ms (14.5x)

Rust (native, release + LTO)

All crates compiled with opt-level=3, LTO, and single codegen unit.

Input	mdream	htmd	html2md	html2md-rs	mdka	html_to_markdown
166 KB	0.34ms	2.13ms (6.3x)	2.71ms (8.0x)	panicked	2.65ms (7.8x)	1.72ms (5.1x)
420 KB	0.41ms	3.50ms (8.6x)	4.25ms (10.4x)	1.54ms (3.8x)	3.56ms (8.7x)	2.72ms (6.7x)
1.8 MB	5.20ms	34.4ms (6.6x)	>30s	35.5ms (6.8x)	37.6ms (7.2x)	28.5ms (5.5x)

Rust NAPI (Node.js bindings)

For Node.js apps that need native speed. Includes N-API overhead.

Input	mdream (rust)	html-to-markdown (rust)
166 KB	0.52ms	3.94ms (7.6x)
420 KB	0.76ms	7.48ms (9.8x)
1.8 MB	7.14ms	82.9ms (11.6x)

CLI (cross-language, includes process startup)

End-to-end cat file | tool > /dev/null via hyperfine. Includes process startup overhead (~20ms for Node.js, ~1ms for Go/Rust).

Input	mdream (Rust)	mdream (Node.js)	html2markdown (Go)
166 KB	1.4ms	26.9ms	4.9ms
420 KB	2.1ms	24.3ms	5.6ms
1.8 MB	10.1ms	34.8ms	75.2ms (7.5x)

mdream's Rust CLI is 2.6-7.5x faster than Go html2markdown. On the 1.8MB file, even the Node.js CLI (with ~20ms startup tax) beats Go by 2.2x. For raw conversion speed without startup overhead, see the JS and Rust tables above.

Streaming

mdream is the only JavaScript HTML-to-markdown converter with streaming support. In the Go ecosystem, JohannesKaufmann/html-to-markdown supports streaming via io.Reader. No other JS, Rust, or Python converter supports streaming HTML input.

Token Efficiency

With minimal: true, mdream produces up to 92% fewer tokens than raw HTML and up to 2x fewer tokens than competing libraries.

Page (HTML tokens)	mdream minimal	Turndown	node-html-markdown
Wikipedia (21K)	6,101 (-71%)	10,435 (-50%)	10,176 (-52%)
GitHub Docs (62K)	5,006 (-92%)	43,983 (-30%)	8,758 (-86%)
Wikipedia XL (194K)	152,425 (-21%)	195,978 (+1%)	283,136 (+46%)

Benchmarks run on real-world HTML using Vitest bench. See full methodology and reproduction steps.

Credits

ultrahtml: HTML parsing inspiration

License

Licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 431 Commits
.claude/skills		.claude/skills
.github		.github
bench		bench
bin		bin
crates		crates
examples		examples
packages		packages
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
CLAUDE.md		CLAUDE.md
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
action.yml		action.yml
eslint.config.js		eslint.config.js
knip.json		knip.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

mdream

Features

What is Mdream?

Mdream Packages

What can Mdream do?

Mdream Usage

Installation

Bundler Compatibility

Basic Usage

Mdream Crawl

Usage

Docker

GitHub Actions Integration

Installation

Vite Integration

Installation

Nuxt Integration

Installation

Browser CDN Usage

Benchmarks

JavaScript (Node.js)

Rust (native, release + LTO)

Rust NAPI (Node.js bindings)

CLI (cross-language, includes process startup)

Streaming

Token Efficiency

Credits

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 61

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages