Convert scanned and regular PDFs into structured Markdown using local LLMs (via Ollama) — 100% Julia, no cloud, no APIs, no nonsense.
OCR, layout parsing, and Markdown generation in one clean Julia pipeline.
- Converts each page of a PDF to high-resolution PNG images using
Poppler_jll - Sends images to a local Ollama instance (e.g.,
gemma3:12b) - 100% local — no internet required, your data stays on your machine
- Outputs structured, clean Markdown
using Pkg
Pkg.add(url="https://github.com/your-username/PDF2Markdown.jl")Or clone the repo manually and Pkg.develop.
using PDF2Markdown
text = extract_text_from_pdf("path/to/your.pdf")
println(text)You must have
ollamainstalled and running locally, with a model likegemma3:12bpulled.
- Julia ≥ 1.9
Ollamarunning locally- A pulled model like
gemma3:12borgemma3:4b - Poppler must be functional via
Poppler_jll
text = extract_text_from_pdf("document.pdf")
write("output.md", text)- Internally uses:
Poppler_jllto convert PDF pages to imagesBase64to encode images for OllamaHTTP.jlandJSON3.jlto communicate with the LLM
- If you're converting large PDFs, consider batching pages.
🔗 Python version: pdf2md-ollama
Inspired by this article on Medium
MIT