Skip to content

avensolutions/url2md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

url2md

Convert web pages to Markdown using the Cloudflare Workers AI toMarkdown API.

Features:

  • Fetches any URL and converts the content to a clean .md file
  • Recursive crawling — follow links to a configurable depth
  • Smart link filtering — skips nav, footers, login pages, file downloads, etc.
  • Post-processing — strips boilerplate ("Skip to main content", "On this page") and collapses excess blank lines
  • Filename generation from page titles with collision avoidance

Prerequisites

  • Rust toolchain (1.85+, edition 2024)
  • A Cloudflare account with access to the Workers AI toMarkdown API
  • Cloudflare Account ID and API Token

Installation

git clone <repo-url> && cd url2md
cargo build --release

The binary is produced at ./target/release/url2md.

Usage

Authentication

Provide your Cloudflare credentials via environment variables or CLI flags:

# environment variables (recommended)
export CLOUDFLARE_ACCOUNT_ID="your-account-id"
export CLOUDFLARE_API_TOKEN="your-api-token"

# or pass as flags
url2md --account-id <ID> --api-token <TOKEN> <URL>

Convert a single page

url2md https://example.com -o ./output

Convert with recursive crawling

# immediate links (depth 1)
url2md https://example.com -o ./output -r 1

# two levels deep
url2md https://example.com -o ./output -r 2

Run from source with cargo run

cargo run -- https://example.com -o ./output
cargo run -- https://example.com -o ./output -r 2

CLI reference

url2md [OPTIONS] <URL>

Arguments:
  <URL>  URL to fetch and convert

Options:
  -o, --output <PATH>        Output directory [default: .]
  -r, --recursive <DEPTH>    Recursively convert linked pages [default: 0]
      --account-id <ID>      Cloudflare account ID (or CLOUDFLARE_ACCOUNT_ID)
      --api-token <TOKEN>    Cloudflare API token (or CLOUDFLARE_API_TOKEN)
  -h, --help                 Print help
  -V, --version              Print version

Development

Build

cargo build              # debug build
cargo build --release    # optimised release build
cargo check              # fast compile check (no binary produced)

Test

cargo test

Lint

cargo clippy                          # check for lint warnings
cargo clippy -- -D warnings           # treat warnings as errors

Format

cargo fmt                             # auto-format code
cargo fmt -- --check                  # check only — does not modify files

Pre-commit check (all-in-one)

Run formatting, linting, and tests together:

cargo fmt -- --check && cargo clippy -- -D warnings && cargo test

Project structure

url2md/
├── Cargo.toml       # package manifest and dependencies
├── Cargo.lock       # locked dependency versions
├── README.md
└── src/
    ├── main.rs      # CLI, HTTP fetching, link extraction, crawl loop
    └── config.rs    # markdown post-processing and unit tests

License

See repository for license details.

About

CLI tool to convert web pages to Markdown using the Cloudflare Workers AI `toMarkdown` API. Supports recursive link crawling with content-aware filtering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages