Skip to content

An application framework developed using the latest AI technologies to extract the values of specific pre-defined keys from a given PDF document. Also generating a document summary using the key & values extracted in the while doing so.

Notifications You must be signed in to change notification settings

403errors/AI-DocParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

AI DocParser

AI DocParser is an AI-powered document parsing tool designed to extract, process, and analyze data from various document formats. It leverages state-of-the-art machine learning models to automate the processing of structured and unstructured data.

Kaggle

Example

Input:

input image

Output:

input image

Features

  • Document Parsing: Extract data from PDFs, images, and other document types.
  • AI-Powered Analysis: Use machine learning models to understand and process text.
  • Customizable Workflows: Easily adapt to different use cases by modifying parameters or integrating additional models.
  • Model Retraining: Fine-tune the parsing model with custom datasets for improved accuracy.

Tech Stack

  • Implemented SpaCy for Named Entity Recognition, text extraction using fitz with accuracy of 99.22%
  • Used RegEx for special type extractioon like date from the legal documents.
  • Optimized data extraction with reinforcement learning, achieving high performance in dynamic PDFs

About

An application framework developed using the latest AI technologies to extract the values of specific pre-defined keys from a given PDF document. Also generating a document summary using the key & values extracted in the while doing so.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published