Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About points that can be optimized in parse PDF to Markdown #137

Open
pengjunfeng11 opened this issue Feb 3, 2025 · 1 comment
Open

Comments

@pengjunfeng11
Copy link

In the _prepare_messages method in modellitellm.py, I found that the project uses the markdown produced by the previous LLM as a few-shot example for the next image conversion by the LLM.

This might lead to potential issues: for instance, if there is a formatting deviation during one of the conversions in a 200-page PDF, the deviation could accumulate and become increasingly significant, similar to an effect like 0.9^200 = 0.000000000705508.

Do you think it would be possible to add some intermediate steps, perhaps by using some multimodal LLMs, to perform self-consistency checks on the output?

@pengjunfeng11
Copy link
Author

I found that this feature is optional, but it is enabled by default. Nonetheless, this improvement still seems necessary. I wonder what the author thinks about this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant