-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi Community,
I'm working with a number of PDF documents from which I need to pull specific data. So far, I've successfully extracted both the date and status from these documents. However, I've been unable to pull out full names.
There are several challenges when it comes to name extraction:
The location of the name isn't consistent.
The name may be in different formats (either first name + last name, or first name + middle name + last name).
There might be varying labels either before or after the name (e.g., "Location:", "112street -", commas, etc.).
Because of these factors, using Microsoft Syntex to extract full names from the PDFs has proved quite difficult.
I've attempted to try different methods to solve this issue, using the following methods:
With regular expression => [A-Za-z]+[ ]+[A-Za-z] (for First name + space + Last name)
Tried with before label and after label. The after label works partially.
The Invoice model is working and can pull the full name from the document(most cases), but not able to pull the status.
But unfortunately, I'm yet to achieve success. I'd welcome any suggestions you may have.
I want to extract the full name using the Syntex.