PDF2Markdown.jl

Convert scanned and regular PDFs into structured Markdown using local LLMs (via Ollama) — 100% Julia, no cloud, no APIs, no nonsense.

OCR, layout parsing, and Markdown generation in one clean Julia pipeline.

Features

Converts each page of a PDF to high-resolution PNG images using Poppler_jll
Sends images to a local Ollama instance (e.g., gemma3:12b)
100% local — no internet required, your data stays on your machine
Outputs structured, clean Markdown

Installation

using Pkg
Pkg.add(url="https://github.com/your-username/PDF2Markdown.jl")

Or clone the repo manually and Pkg.develop.

Usage

using PDF2Markdown

text = extract_text_from_pdf("path/to/your.pdf")
println(text)

You must have ollama installed and running locally, with a model like gemma3:12b pulled.

Requirements

Julia ≥ 1.9
Ollama running locally
A pulled model like gemma3:12b or gemma3:4b
Poppler must be functional via Poppler_jll

Example

text = extract_text_from_pdf("document.pdf")
write("output.md", text)

Notes

Internally uses:
- Poppler_jll to convert PDF pages to images
- Base64 to encode images for Ollama
- HTTP.jl and JSON3.jl to communicate with the LLM
If you're converting large PDFs, consider batching pages.

Related Project

🔗 Python version: pdf2md-ollama

Credits

Inspired by this article on Medium

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF2Markdown.jl

Features

Installation

Usage

Requirements

Example

Notes

Related Project

Credits

License

About

Uh oh!

Releases

Packages

Languages

gwangjinkim/PDF2Markdown.jl

Folders and files

Latest commit

History

Repository files navigation

PDF2Markdown.jl

Features

Installation

Usage

Requirements

Example

Notes

Related Project

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages