Skip to content

pdf2md-ollama is a lightweight, privacy-first tool that converts PDF files — including scanned documents — into clean, structured Markdown using a local LLM (e.g. Gemma 3 via Ollama). No cloud APIs, no tokens, and no privacy tradeoffs.

Notifications You must be signed in to change notification settings

gwangjinkim/pdf2md_ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convert PDFs to Markdown with Local LLMs

Fast, private, and free PDF-to-Markdown conversion using a local LLM (e.g., Gemma 3 via Ollama). No cloud APIs, no tokens, no privacy concerns — just elegant Python.

Article: Convert PDFs to Markdown using Local LLMs — Fast, Private, and Free


Installation (Option 1: Using uv)

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
uv pip install -r requirements.txt

Then run:

uv run src/pdf2md.py

Installation (Option 2: Using pip)

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then run:

python src/pdf2md.py

Dependencies

  • PyMuPDF (for PDF rendering) — not free for commercial use
  • Pillow
  • Ollama
  • A local model like gemma3:12b or gemma3:4b must be installed and pulled

Alternative: Fully Open Source Version

Use pdf2image + Poppler instead of pymupdf.

brew install poppler  # or sudo apt install poppler-utils
pip install pdf2image pillow ollama

Then run:

python src/pdf2md_poppler.py

Features

  • Handles scanned PDFs via image input
  • Converts to clean, structured Markdown
  • Works offline with local models

Convert pdf to image

pip install pdf2image
brew install poppler    # or in Linux: sudo apt install poppler-utils

About

pdf2md-ollama is a lightweight, privacy-first tool that converts PDF files — including scanned documents — into clean, structured Markdown using a local LLM (e.g. Gemma 3 via Ollama). No cloud APIs, no tokens, and no privacy tradeoffs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages