Skip to content

nikitagrover19/ProInsight-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

🧠 ProInsight - Email Insight Classifier

This project applies Machine Learning and Natural Language Processing (NLP) techniques to analyze and classify corporate emails from the Enron Email Dataset.
It extracts semantic patterns, relationships, and insights from large-scale email communication data to help visualize and interpret professional correspondence.


🌐 Live Links


📂 Dataset

Source: Enron Email Dataset (Kaggle)

Cleaning Process:
Raw email data was parsed using Python’s email module to extract:

  • Message-ID
  • Date
  • From
  • To
  • Subject
  • Body

The cleaned dataset was saved as emails_clean.csv for downstream NLP and ML analysis.


⚙️ Preprocessing & Feature Engineering

  • Data Cleaning: Removal of stopwords, punctuation, and non-ASCII characters.
  • Tokenization & Lemmatization: Performed using SpaCy.
  • Feature Extraction: TF-IDF vectorization and word frequency analysis.
  • Network Analysis: Constructed sender–receiver communication graphs using NetworkX.

🧩 Machine Learning Pipeline

  1. Data Parsing & Cleaning — Extracts and structures raw email data.
  2. Exploratory Data Analysis (EDA) — Analyzes communication frequency, sentiment, and relationships.
  3. Feature Engineering — Uses TF-IDF and embeddings for semantic representation.
  4. Classification / Clustering — Identifies thematic or behavioral patterns in email content.
  5. Visualization — Builds network graphs using NetworkX and Matplotlib.

🤖 Gemini API Integration

The project integrates Google’s Gemini API for:

  • Text summarization
  • Semantic similarity comparison
  • Context-aware keyword extraction
  • Insight generation on communication trends

🧰 Tech Stack

  • Languages & Libraries: Python, Pandas, NumPy
  • NLP Tools: SpaCy, TextBlob
  • ML Framework: scikit-learn
  • Visualization: Matplotlib, NetworkX
  • API: Gemini API
  • Frontend: React (Vite + Tailwind + shadcn/ui)
  • Backend: FastAPI (deployed on Render)

🚀 Running the Project

1️⃣ Clone the repository

git clone https://github.com/nikitagrover19/ProInsight-ML.git
cd ProInsight-ML
cd scripts

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages