Email interactions have become an essential form of communication in the web today. Companies, schools and regular-business have adopted these exchanges to disseminate information to a wide broadcast. Uncovering the main parts of an email can be used as a relevant source of information to extract not only the user’s profile but also the writing trends and patterns associated. In this project, it is proposed an automatic approach to detect the general structure of emails by extracting the greetings, body and signature zones. In specific, a recurrent neural network enhance with a set of customized rule-based constraints are employed for detecting the different email parts. The proposed method is applied in a well known email corpus (Enron, Apache mailing list, etc.) outperforming baseline results related to traditional algorithms and hand-crafted rules. The results obtained show that the analysis of word embedding sequences and the use of specific word-position rules helps to accurately predict the email zones of texts lines.
kartiikthakur/Email-Zoning--ML-Project
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|