Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automating Infographics/Graphs Analysis and extracting to structured JSON #3

Open
raziurtotha opened this issue Apr 8, 2024 · 1 comment

Comments

@raziurtotha
Copy link

raziurtotha commented Apr 8, 2024

Firstly, I'd like to express my appreciation for the provided sample codes. They've been quite helpful.

I'm currently working with a significant number of PDF files, each containing an array of infographics, graphs, charts, and text. These elements are presented without any systematic order within the documents. My objective is to utilize azure-openai-GPT-4-vision API to comprehend the context and details within these visual elements, subsequently extracting and summarizing this information into structured JSON data, complete with specific key:value pairs. Some of these pairs, such as document_title, author, publication_date, etc., are predefined in the prompt alongside a few-shot examples. At this moment, my process involves handling each PDF file individually with ChatGPT (GPT-4).

Could anyone offer guidance or insights on how to achieve this analysis and data extraction process using the GPT-4-vision API for large number of very unstructed PDF files efficiently?

An example of the pdf file is attached here below:
Gen Z (Global) report - GWI.pdf

Any suggestions or advice on streamlining this task would be immensely appreciated.

@raziurtotha raziurtotha changed the title Automating Infographics and Graphs Analysis and extracting to structured JSON Automating Infographics/Graphs Analysis and extracting to structured JSON Apr 8, 2024
@jamesmcroft
Copy link
Member

Hi @raziurtotha, thanks for taking the time to explore the sample.

To clarify, are you aiming to process multiple documents and consolidate the extracted data? Or is it the intention that you want to scale out document extraction for many differing documents?

I would recommend exploring Durable Functions for either if you intend on these being long running processes that you will return results on later. The benefit to using Durable Functions is that the orchestration state is also stored, so if something were to happen in your environment, it can recover and continue execution.

Durable Functions will give you the flexibility to create workflows, chaining activities together to perform the steps shown in the sample here to split documents, convert them, and prompt the GPT-4 Vision model. For batch processing, you can also adopt the fan out pattern for Functions.

If the request is regarding the first clarifying question, the fan out/fan in pattern described in the link above will be ideal. Essentially batch request fanning out to iterate over each document to perform extraction, and then consolidating the results later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants