A local macOS AI-based OCR utility for converting PDFs and images into Markdown
with glm-ocr running through Ollama.
The workflow stays local: PDFs are rendered into page images with Poppler,
images are normalized with Pillow, each page/image is sent to the local Ollama
model, and the result is saved as .md.
To use Ollala locally with a clean, user-friendly web interface, check out Ollala's Web Interface.
If you want ultra-fast, non-AI-based PDF to text conversion, check out my other open-source project: pdf2txt.
Ollama does not natively support reading or parsing PDF files currently. It handles multimodal inputs by accepting images. Because of this, the PDF must first be rendered into page-by-page images (which we do using Poppler), and then each image is passed to the glm-ocr model on your local Ollama for text extraction. This script seamlessly automates that entire process for you.
- Converts one PDF or image into a Markdown file.
- Converts a folder of PDFs/images in batch.
- Optionally scans folders recursively.
- Preserves multi-page PDFs as one Markdown file.
- Separates PDF pages with
---in the final Markdown. - Asks
glm-ocrto extract text, tables, and formulas in clean Markdown. - Prints progress while processing files and pages.
- Processes PDF pages one at a time instead of loading the whole PDF into RAM.
- Writes PDF output incrementally after each page, so partial progress is kept if a later page fails.
- Checks for common setup problems before running OCR.
Supported image formats:
png, jpg, jpeg, webp, tif, tiff, bmp
This project is built for macOS, especially Apple Silicon Macs.
You need:
- Python 3
- Ollama running in the background
- The latest
glm-ocrmodel installed in Ollama - Poppler for PDF conversion
- Python packages from
requirements.txt
Poppler is required for PDFs. Install it once from Terminal:
brew install popplerMake sure Ollama is running before starting OCR. Also make sure glm-ocr is
installed:
ollama pull glm-ocr
ollama listollama list should show glm-ocr or glm-ocr:latest.
Clone the repository and install dependencies:
git clone https://github.com/realanthonyc/ollala-ai-ocr.git
cd ollala-ai-ocr
python3 -m venv .venv
.venv/bin/pip install -r requirements.txtIf Poppler or glm-ocr are not installed yet, run:
brew install poppler
ollama pull glm-ocrRun commands from the project directory:
cd ollala-ai-ocrConvert one PDF:
.venv/bin/python ocr_to_md.py /path/to/file.pdfConvert one image:
.venv/bin/python ocr_to_md.py /path/to/image.pngConvert all supported files in a folder:
.venv/bin/python ocr_to_md.py /path/to/input-folderConvert a folder and its subfolders:
.venv/bin/python ocr_to_md.py /path/to/input-folder --recursiveChoose where Markdown files are saved:
.venv/bin/python ocr_to_md.py /path/to/input-folder -o /path/to/output-folderUse a lighter setting for large or scanned PDFs:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --dpi 100 --max-side 1400 --num-ctx 4096 --keep-alive 0 --request-timeout 120Start with the safe profile if the PDF is large, scanned, image-heavy, or if the Mac becomes sluggish during OCR:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --dpi 100 --max-side 1400 --num-ctx 4096 --keep-alive 0 --request-timeout 120If that works, try a balanced profile:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --dpi 140 --max-side 1800 --num-ctx 8192 --keep-alive 30sFor cleaner or smaller PDFs, the default profile may give better OCR detail:
.venv/bin/python ocr_to_md.py /path/to/file.pdfIf you need more OCR detail and the Mac remains stable, increase quality gradually:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --dpi 180 --max-side 2200 --num-ctx 16384Avoid jumping straight to very high DPI on large scanned PDFs. Higher DPI and
larger --max-side values create larger images for Ollama and can sharply
increase memory pressure.
input
Required. A PDF, image, or directory containing supported files.
-o, --output-dir
Output folder for Markdown files.
Default: markdown_output inside the input folder.
--model
Ollama model name.
Default: glm-ocr
--dpi
PDF render resolution.
Higher values can improve OCR quality but increase memory use and latency.
Default: 160
--max-side
Maximum image side length before sending an image to Ollama.
Lower this if OCR is too slow or memory pressure is high.
Default: 2000
--image-format
Temporary image format sent to Ollama.
JPEG is smaller and usually safer for large PDFs. PNG is lossless but heavier.
Default: jpeg
--jpeg-quality
JPEG quality when --image-format jpeg is used.
Default: 85
--num-ctx
Ollama context window.
The default follows the high-resolution OCR setting, but lowering it can
reduce memory pressure.
Default: 16384
--keep-alive
How long Ollama keeps the model loaded after each request.
Use 0 to unload after each page if memory pressure is a problem.
Default: 30s
--page-retries
Retry failed PDF pages with lighter image/context settings.
Default: 2
--stop-on-page-error
Stop the PDF if a page fails. By default, failed pages get a Markdown error
marker and the script continues.
--request-timeout
Per-page Ollama request timeout in seconds.
Default: 180
--start-page
First PDF page to process. Useful for testing or splitting a large PDF.
Default: 1
--end-page
Last PDF page to process. Useful for testing or splitting a large PDF.
Default: end of PDF
--recursive
Scan input directories recursively.
For a single file:
file.pdf -> markdown_output/file.md
image.png -> markdown_output/image.md
For a folder, the script keeps the relative folder structure in the output directory.
For multi-page PDFs, all pages are combined into one Markdown file:
Page 1 content
---
Page 2 contentIf PDF conversion fails, check Poppler:
which pdftoppm
brew install popplerIf the script cannot connect to Ollama, open the Ollama app or start the Ollama service, then check:
ollama listIf glm-ocr is missing:
ollama pull glm-ocrIf large PDFs are slow, memory-heavy, or make Ollama unstable, use the safer profile:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --dpi 100 --max-side 1400 --num-ctx 4096 --keep-alive 0 --request-timeout 120If only one page fails because Ollama returns a model/server error, the script
now retries that page with lighter settings. If it still fails, the default
behavior is to write a clear error marker for that page and continue with the
remaining pages. Add --stop-on-page-error if you prefer the old fail-fast
behavior.
To test a single difficult page before running the whole PDF:
.venv/bin/python ocr_to_md.py /path/to/file.pdf --start-page 6 --end-page 6 --dpi 90 --max-side 1100 --num-ctx 2048 --keep-alive 0 --request-timeout 90OCR quality depends on the source document, scan clarity, PDF render DPI, image
resolution, and the current glm-ocr model behavior. For clean digital PDFs,
lower DPI may be enough. For scanned or dense documents, higher DPI may improve
accuracy at the cost of speed.
If the Mac becomes sluggish during OCR, stop the run and restart with lower
--dpi, lower --max-side, lower --num-ctx, and --keep-alive 0.