You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some PDF documents that I want to do local parsing before feeding into the Single LLM calls. For example. I want to use Docling to extract the raw text then use LLMs to do more structured parsing and classification.
How do I that?
The text was updated successfully, but these errors were encountered:
Thanks for your interest in using Docling with our tool. We did experiment with integrating Docling into PySpur for a time, but we ultimately had to roll it back. The main issue was that including Docling increased the size of the pyspur-backend Docker image significantly (over 10GB) because of the OCR model weights, and that wasn’t ideal for many users who don’t need that functionality.
That said, you can still use Docling in your workflow. I recommend installing Docling in the same environment where you have PySpur installed (note that this approach isn’t supported in container mode). Once set up, you can add a Python code node to parse your PDFs with Docling and then feed the output into your Single LLM calls for further structured parsing and classification.
I hope this helps, and please let us know if you have any further questions or run into any issues!
I have some PDF documents that I want to do local parsing before feeding into the Single LLM calls. For example. I want to use Docling to extract the raw text then use LLMs to do more structured parsing and classification.
How do I that?
The text was updated successfully, but these errors were encountered: