Add CloudPDFProcessor for PDF processing with Mistral API #167

Blecoeur · 2025-09-12T21:15:45Z

Implemented CloudPDFProcessor class to handle PDF file processing.
Added methods for encoding PDFs to base64, decoding images from base64, and extracting results from OCR responses.
Added a method to maximise the number of requests per second without getting "429 - Too Many Request" errors as Mistral API has a pretty low maximum number of requests per second.
Added tests for CloudPDFProcessor to validate file acceptance and processing functionality.

Linked to issue #15

- Implemented CloudPDFProcessor class to handle PDF file processing. - Added methods for encoding PDFs to base64, decoding images from base64, and extracting results from OCR responses. - Included synchronous and asynchronous processing methods, along with batch processing capabilities (that takes into account the maximum number of request per second) - Added tests for CloudPDFProcessor to validate file acceptance and processing functionality.

Blecoeur and others added 2 commits September 12, 2025 23:08

Add 'mistralai' to dependencies in pyproject.toml

f0d936b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CloudPDFProcessor for PDF processing with Mistral API #167

Add CloudPDFProcessor for PDF processing with Mistral API #167

Uh oh!

Blecoeur commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add CloudPDFProcessor for PDF processing with Mistral API #167

Are you sure you want to change the base?

Add CloudPDFProcessor for PDF processing with Mistral API #167

Uh oh!

Conversation

Blecoeur commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants