Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: how to add dependencies for Python Function Call? #171

Open
felixgao opened this issue Feb 22, 2025 · 1 comment
Open

Q: how to add dependencies for Python Function Call? #171

felixgao opened this issue Feb 22, 2025 · 1 comment

Comments

@felixgao
Copy link

I have some PDF documents that I want to do local parsing before feeding into the Single LLM calls. For example. I want to use Docling to extract the raw text then use LLMs to do more structured parsing and classification.

How do I that?

@srijanpatel
Copy link
Collaborator

Hi Felix,

Thanks for your interest in using Docling with our tool. We did experiment with integrating Docling into PySpur for a time, but we ultimately had to roll it back. The main issue was that including Docling increased the size of the pyspur-backend Docker image significantly (over 10GB) because of the OCR model weights, and that wasn’t ideal for many users who don’t need that functionality.

That said, you can still use Docling in your workflow. I recommend installing Docling in the same environment where you have PySpur installed (note that this approach isn’t supported in container mode). Once set up, you can add a Python code node to parse your PDFs with Docling and then feed the output into your Single LLM calls for further structured parsing and classification.

I hope this helps, and please let us know if you have any further questions or run into any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants