Skip to content

harlan-zw/mdream

mdream

npm version npm downloads license

☁️ The fastest HTML to markdown converter on GitHub. Optimized for LLMs and supports streaming.

Tip

🎉 Mdream v1 is here! Read the v1 release notes.

mdream logo

Made possible by my Sponsor Program 💖
Follow me @harlan_zw 🐦 • Join Discord for help

Features

  • 🧠 #1 Token Optimizer: Up to 2x fewer tokens than Turndown, node-html-markdown, and html-to-markdown. 70-99% fewer tokens than raw HTML.
  • 🚀 #1 Fastest: Fastest pure JS & native Rust converter. Up to 37x faster than Turndown (Rust NAPI vs JS), 4.6x, 5x faster than htmd (Rust vs Rust). Converts 1.8MB HTML in ~5.2ms (Rust).
  • 🔍 Generates Minimal GitHub Flavored Markdown: Frontmatter, Nested & HTML markup support.
  • 🌊 Streamable: Memory efficient streaming for large documents and real-time pipelines.
  • ⚡ Tiny: 10kB gzip JS core, 60kB gzip with Rust WASM engine. Zero dependencies.
  • ⚙️ Run anywhere: CLI Crawler, Docker, GitHub Actions, Vite, & more.

What is Mdream?

A zero-dependency, LLM-optimized HTML to Markdown converter. Faster and leaner than Turndown, node-html-markdown, and html-to-markdown, with output tuned for token efficiency and readability.

On top of the core converter, Mdream ships packages to generate LLM artifacts like llms.txt for your own sites or produce LLM context for any project.

Mdream Packages

Mdream is built to run anywhere for all projects and use cases and is available in the following packages:

Package Description
mdream logo mdream Rust NAPI engine + WASM for edge. Performance-first, declarative config. Includes CLI.
mdream logo @mdream/js Pure JS engine. Full hook access, zero native deps. Subpaths: /plugins, /splitter, /parse, /llms-txt, /negotiate.
mdream logo@mdream/crawl Site-wide crawler to generate llms.txt artifacts from entire websites
docker icon Docker Pre-built Docker image with Playwright Chrome for containerized website crawling
vite icon @mdream/vite Generate automatic .md for your own Vite sites
nuxt icon @mdream/nuxt Generate automatic .md and llms.txt artifacts generation for Nuxt Sites
github icon @mdream/action Generate .md and llms.txt artifacts from your static .html output
rust icon mdream (crate) Native Rust crate with CLI. Zero dependencies, streaming support. Available on crates.io
browser icon Browser CDN Use mdream directly in browsers via unpkg/jsDelivr without any build step

What can Mdream do?

📥 URL to Markdown

Fetches the Markdown Wikipedia page and converts it to Markdown preserving the original links and images.

curl -s https://en.wikipedia.org/wiki/Markdown \
 | npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
  | tee streaming.md

Tip: The --origin flag will fix relative image and link paths

Want to make it look nice? Use glow.

curl -s https://en.wikipedia.org/wiki/Markdown \
 | npx mdream@beta --origin https://en.wikipedia.org --preset minimal \
   | glow
📄 Local HTML to Markdown

Converts a local HTML file to a Markdown file, using tee to write the output to a file and display it in the terminal.

cat index.html \
 | npx mdream@beta --preset minimal \
  | tee streaming.md

Want to make it look nice? Use glow.

cat index.html \
 | npx mdream@beta --preset minimal \
  | glow
🧠 Feed Any Website to an LLM

Pipe web content straight into Claude, GPT, or any LLM CLI:

# Single page → Claude
curl -s https://react.dev/learn | npx mdream@beta --origin https://react.dev --preset minimal \
  | claude -p "explain the key concepts on this page"

# Crawl entire docs → summarize
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
cat output/llms-full.txt | claude -p "write a getting started guide from these docs"

# Compare two frameworks
diff <(curl -s https://vuejs.org/guide/introduction | npx mdream@beta --preset minimal) \
     <(curl -s https://react.dev/learn | npx mdream@beta --preset minimal) \
  | claude -p "compare these two frameworks based on their intro docs"

# JavaScript/SPA sites (React, Vue, Angular)
npx @mdream/crawl@beta https://spa-site.com --driver playwright
cat output/llms-full.txt | claude -p "what features does this app have"
🌐 Make Your Site AI-Discoverable

Generate llms.txt to help AI tools understand your site:

# Static sites
npx @mdream/crawl@beta https://yoursite.com

# JavaScript/SPA sites (React, Vue, Angular)
npx -p playwright -p @mdream/crawl@beta crawl https://spa-site.com --driver playwright

Outputs:

  • output/llms.txt - Optimized for LLM consumption
  • output/llms-full.txt - Complete content with metadata
  • output/md/ - Individual markdown files per page
🗄️ Build RAG Systems from Websites

Crawl websites and generate embeddings for vector databases:

import { crawlAndGenerate } from '@mdream/crawl'
import { withMinimalPreset } from '@mdream/js/preset/minimal'
import { htmlToMarkdownSplitChunks } from '@mdream/js/splitter'
import { embed } from 'ai'

const { createTransformersJS } = await import('@built-in-ai/transformers-js')
const embeddingModel = createTransformersJS().textEmbeddingModel('Xenova/bge-base-en-v1.5')

const embeddings = []

await crawlAndGenerate({
  urls: ['https://example.com'],
  onPage: async ({ url, html, title, origin }) => {
    const chunks = htmlToMarkdownSplitChunks(html, withMinimalPreset({
      chunkSize: 1000,
      chunkOverlap: 200,
      origin,
    }))

    for (const chunk of chunks) {
      const { embedding } = await embed({ model: embeddingModel, value: chunk.content })
      embeddings.push({ url, title, content: chunk.content, embedding })
    }
  },
})

// Save to vector database: await saveToVectorDB(embeddings)
✂️ Extract Specific Content from Pages

Pull headers, images, or other elements during conversion:

import { htmlToMarkdown } from 'mdream'

const headers = []
const images = []

htmlToMarkdown(html, {
  extraction: {
    'h1, h2, h3': el => headers.push(el.textContent),
    'img[src]': el => images.push({ src: el.attributes.src, alt: el.attributes.alt }),
  },
})
⚡ Optimize Token Usage With Clean Mode

Use clean: true (enabled by default with minimal: true) to automatically reduce token costs:

import { htmlToMarkdown } from 'mdream'

// All clean features enabled
htmlToMarkdown(html, { clean: true })

// Or selective features
htmlToMarkdown(html, {
  clean: {
    emptyLinks: true, // Strip #, javascript: links
    emptyLinkText: true, // Drop [](url) links with no text
    emptyImages: true, // Strip ![](url) with no alt text
    redundantLinks: true, // [url](url) → url
    selfLinkHeadings: true, // ## [Title](#title) → ## Title
    fragments: true, // Strip broken #anchor links
    urls: true, // Strip utm_*, fbclid tracking params
  }
})

Mdream Usage

Installation

pnpm add mdream@beta

Tip

Generate an Agent Skill for this package using skilld:

npx skilld add mdream

Bundler Compatibility

The mdream package uses native Node.js bindings (NAPI-RS) which cannot be statically bundled. If your bundler fails to resolve mdream, mark it as external:

Next.js / Turbopack:

// next.config.js
const nextConfig = {
  serverExternalPackages: ['mdream'],
}

Webpack / other bundlers:

externals: ['mdream']

Tip

@mdream/js has zero native dependencies and works with all bundlers without configuration.

Tip

Using Vite? @mdream/vite handles this automatically.

Basic Usage

import { htmlToMarkdown } from 'mdream'

// Rust NAPI engine in Node.js, WASM in edge/browser runtimes
const markdown = htmlToMarkdown('<h1>Hello World</h1>')
console.log(markdown) // # Hello World
import { streamHtmlToMarkdown } from 'mdream'

const response = await fetch('https://en.wikipedia.org/wiki/Markdown')
for await (const chunk of streamHtmlToMarkdown(response.body, {
  origin: 'https://en.wikipedia.org',
  minimal: true,
})) {
  process.stdout.write(chunk)
}

See the mdream docs for complete details.

Mdream Crawl

Need something that works in the browser or an edge runtime? Use Mdream.

The @mdream/crawl package crawls an entire site generating LLM artifacts using mdream for Markdown conversion.

  • llms.txt: A consolidated text file optimized for LLM consumption.
  • llms-full.txt: An extended format with comprehensive metadata and full content.
  • Individual Markdown Files: Each crawled page is saved as a separate Markdown file in the md/ directory.

Usage

# Interactive
npx @mdream/crawl@beta
# Simple
npx @mdream/crawl@beta https://harlanzw.com
# Glob patterns
npx @mdream/crawl@beta "https://nuxt.com/docs/getting-started/**"
# Get help
npx @mdream/crawl@beta -h

Docker

Run @mdream/crawl with Playwright Chrome pre-installed for website crawling in containerized environments.

# Quick start
docker run harlanzw/mdream:latest site.com/docs/**

# Interactive mode
docker run -it harlanzw/mdream:latest

# Using Playwright for JavaScript sites
docker run harlanzw/mdream:latest spa-site.com --driver playwright

Available Images:

  • harlanzw/mdream:latest - Latest stable release
  • ghcr.io/harlan-zw/mdream:latest - GitHub Container Registry

See DOCKER.md for complete usage, configuration, and building instructions.

GitHub Actions Integration

Installation

pnpm add @mdream/action@beta

See the GitHub Actions README for usage and configuration.

Vite Integration

Installation

pnpm install @mdream/vite@beta

See the Vite README for usage and configuration.

Nuxt Integration

Installation

pnpm add @mdream/nuxt@beta

See the Nuxt Module README for usage and configuration.

Browser CDN Usage

Use mdream directly via CDN with no build step. Call init() once to load the WASM binary, then use htmlToMarkdown() synchronously:

<script src="https://unpkg.com/mdream/dist/iife.js"></script>
<script>
  await window.mdream.init()
  const markdown = window.mdream.htmlToMarkdown('<h1>Hello</h1><p>World</p>')
  console.log(markdown) // # Hello\n\nWorld
</script>

CDN Options:

  • unpkg: https://unpkg.com/mdream/dist/iife.js
  • jsDelivr: https://cdn.jsdelivr.net/npm/mdream/dist/iife.js

Benchmarks

JavaScript (Node.js)

Pure JS comparison. mdream uses no plugins, Turndown uses GFM plugin for equivalent table/strikethrough support.

Input mdream Turndown node-html-markdown rehype-remark
166 KB 3.26ms 11.26ms (3.5x) 14.31ms (4.4x) 35.19ms (10.8x)
420 KB 6.38ms 13.63ms (2.1x) 17.11ms (2.7x) 62.10ms (9.7x)
1.8 MB 57.2ms 264.3ms (4.6x) 26,072ms (456x) 826.7ms (14.5x)

Rust (native, release + LTO)

All crates compiled with opt-level=3, LTO, and single codegen unit.

Input mdream htmd html2md html2md-rs mdka html_to_markdown
166 KB 0.34ms 2.13ms (6.3x) 2.71ms (8.0x) panicked 2.65ms (7.8x) 1.72ms (5.1x)
420 KB 0.41ms 3.50ms (8.6x) 4.25ms (10.4x) 1.54ms (3.8x) 3.56ms (8.7x) 2.72ms (6.7x)
1.8 MB 5.20ms 34.4ms (6.6x) >30s 35.5ms (6.8x) 37.6ms (7.2x) 28.5ms (5.5x)

Rust NAPI (Node.js bindings)

For Node.js apps that need native speed. Includes N-API overhead.

Input mdream (rust) html-to-markdown (rust)
166 KB 0.52ms 3.94ms (7.6x)
420 KB 0.76ms 7.48ms (9.8x)
1.8 MB 7.14ms 82.9ms (11.6x)

CLI (cross-language, includes process startup)

End-to-end cat file | tool > /dev/null via hyperfine. Includes process startup overhead (~20ms for Node.js, ~1ms for Go/Rust).

Input mdream (Rust) mdream (Node.js) html2markdown (Go)
166 KB 1.4ms 26.9ms 4.9ms
420 KB 2.1ms 24.3ms 5.6ms
1.8 MB 10.1ms 34.8ms 75.2ms (7.5x)

mdream's Rust CLI is 2.6-7.5x faster than Go html2markdown. On the 1.8MB file, even the Node.js CLI (with ~20ms startup tax) beats Go by 2.2x. For raw conversion speed without startup overhead, see the JS and Rust tables above.

Streaming

mdream is the only JavaScript HTML-to-markdown converter with streaming support. In the Go ecosystem, JohannesKaufmann/html-to-markdown supports streaming via io.Reader. No other JS, Rust, or Python converter supports streaming HTML input.

Token Efficiency

With minimal: true, mdream produces up to 92% fewer tokens than raw HTML and up to 2x fewer tokens than competing libraries.

Page (HTML tokens) mdream minimal Turndown node-html-markdown
Wikipedia (21K) 6,101 (-71%) 10,435 (-50%) 10,176 (-52%)
GitHub Docs (62K) 5,006 (-92%) 43,983 (-30%) 8,758 (-86%)
Wikipedia XL (194K) 152,425 (-21%) 195,978 (+1%) 283,136 (+46%)

Benchmarks run on real-world HTML using Vitest bench. See full methodology and reproduction steps.

Credits

License

Licensed under the MIT license.

About

☁️ The fastest HTML to markdown convertor on GitHub. Optimized for LLMs and supports streaming.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors