docling_ocr

A powerful Python package for extracting text from images and documents using advanced LLM-based models.

Overview

docling_ocr leverages state-of-the-art language models specifically designed for document understanding tasks. Unlike traditional OCR engines that rely solely on character recognition, docling_ocr uses language models that understand document context, layouts, and can handle various document formats with high accuracy.

Built on top of models like SmolDocling, this package provides a simple, intuitive interface for document text extraction tasks.

Features

LLM-powered extraction: Uses advanced language models trained specifically for document understanding
Context-aware recognition: Understands document layouts and context for improved accuracy
Multi-format support: Works with scanned documents, forms, receipts, and other text-heavy images
Simple API: Easy-to-use interface with both file and image object inputs
Batch processing: Process entire directories of documents efficiently
Flexible output options: Return text or save directly to files
Extensible architecture: Abstract base class makes it easy to add new models

Installation

pip install docling_ocr

Requirements

Python 3.7+
PyTorch 1.10.0+
Transformers 4.15.0+
Pillow 8.0.0+

Quick Start

Basic Usage

from docling_ocr import SmolDoclingExtractor

# Initialize the extractor
extractor = SmolDoclingExtractor()

# Extract text from an image file
text = extractor.extract_text("path/to/document.jpg")
print(text)

# Or use the shorthand callable interface
text = extractor("path/to/document.jpg")

Using with PIL Images

from docling_ocr import SmolDoclingExtractor
from PIL import Image

# Initialize the extractor
extractor = SmolDoclingExtractor()

# Open image with PIL
image = Image.open("path/to/document.jpg")

# Extract text
text = extractor.extract_text_from_image(image)
print(text)

Batch Processing

from docling_ocr import SmolDoclingExtractor
from docling_ocr.utils import batch_process

# Initialize extractor
extractor = SmolDoclingExtractor()

# Process all images in a directory
results = batch_process(
    extractor, 
    image_dir="path/to/documents/", 
    output_dir="path/to/output/",
    extensions=['.jpg', '.png', '.pdf']  # Optional: specify file extensions
)

# Results contains a dictionary mapping filenames to extracted text
for filename, text in results.items():
    print(f"File: {filename}")
    print(f"Text: {text[:100]}...")  # Print first 100 chars
    print("-" * 50)

Advanced Usage

GPU Acceleration

By default, the extractor will use CUDA if available. You can explicitly specify the device:

# Use CPU explicitly
extractor = SmolDoclingExtractor(device="cpu")

# Use specific GPU
extractor = SmolDoclingExtractor(device="cuda:0")

Custom Model Configuration

You can specify a different model from the same family:

# Use a different model variant
extractor = SmolDoclingExtractor(model_name="ds4sd/SmolDocling-512M")

Adjusting Generated Text Length

For longer documents, you may want to increase the maximum generated text length:

# Extract with a longer maximum length for complex documents
text = extractor.extract_text("complex_document.pdf", max_length=1024)

Performance Considerations

Processing time depends on the image size, complexity, and hardware
GPU acceleration is recommended for batch processing
First initialization loads the model which may take some time
Subsequent calls are much faster as the model remains in memory

Comparison with Traditional OCR

docling_ocr differs from traditional OCR engines in several key ways:

Feature	Traditional OCR	docling_ocr
Text Recognition	Character/word based	Context-aware language understanding
Layout Understanding	Limited/separate process	Integrated understanding
Language Understanding	Limited	Leverages LLM language capabilities
Format Flexibility	Engine-specific	Adaptable to various formats
Context Retention	Limited	Maintains document context

Examples

Forms and Structured Documents

from docling_ocr import SmolDoclingExtractor

extractor = SmolDoclingExtractor()
form_text = extractor("tax_form.jpg")
print(form_text)

Tables and Spreadsheets

spreadsheet_text = extractor("financial_data.jpg")
print(spreadsheet_text)

Receipts and Invoices

receipt_text = extractor("receipt.jpg")
print(receipt_text)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Future Roadmap

Support for PDF documents with multi-page handling
Additional LLM-based extraction models
Fine-tuning options for specific document types
Structured data extraction (JSON output)
Layout-preserving extraction options

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on the amazing work of the SmolDocling team for the SmolDocling-256M-preview model.
Inspired by the growing field of document AI
Thanks to the HuggingFace team for making transformers accessible

Citation

If you use this package in your research, please cite:

@software{docling_ocr,
  author = {Adhing'a Fredrick},
  title = {docling_ocr: LLM-based Document Text Extraction},
  year = {2025},
  url = {https://github.com/FREDERICO23/docling_ocr}
}

Contact

For questions and support, please open an issue on the GitHub repository or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
docling_ocr		docling_ocr
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docling_ocr

Overview

Features

Installation

Requirements

Quick Start

Basic Usage

Using with PIL Images

Batch Processing

Advanced Usage

GPU Acceleration

Custom Model Configuration

Adjusting Generated Text Length

Performance Considerations

Comparison with Traditional OCR

Examples

Forms and Structured Documents

Tables and Spreadsheets

Receipts and Invoices

Contributing

Future Roadmap

License

Acknowledgments

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

FREDERICO23/docling_ocr

Folders and files

Latest commit

History

Repository files navigation

docling_ocr

Overview

Features

Installation

Requirements

Quick Start

Basic Usage

Using with PIL Images

Batch Processing

Advanced Usage

GPU Acceleration

Custom Model Configuration

Adjusting Generated Text Length

Performance Considerations

Comparison with Traditional OCR

Examples

Forms and Structured Documents

Tables and Spreadsheets

Receipts and Invoices

Contributing

Future Roadmap

License

Acknowledgments

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages