ReSolveAI - AI Complaint Classification Engine 🧭🤖

Automated Complaint Understanding for Financial Services
ReSolveAI transforms unstructured customer complaints into actionable, categorized insights using NLP, topic modeling, and supervised machine learning.
This project blends practical engineering with research-grade experimentation.

📄 Full Report & Slides included (see Resources section).

🔍 Overview

ReSolveAI is an end-to-end AI engine that:

Cleans and preprocesses raw complaint text
Extracts meaningful patterns using TF-IDF
Discovers hidden topics with Non-Negative Matrix Factorization (NMF)
Uses these topics to train classification models
Produces automated complaint category predictions with explainability

Why ReSolveAI matters:

⚡ Faster resolution
🎯 Higher routing accuracy
💸 Reduced operational overhead
😊 Improved customer satisfaction

This project analyzed 78,313+ complaint records across 22 metadata fields, using label-aligned topic clusters from NMF.
:contentReference[oaicite:2]{index=2}

📁 Repository Structure

├─ README.md

├─ src/

│ ├─ notebook.ipynb

├─ dataset/

│ ├─ complaints-2021-05-14_08_16.json

├─ docs/

│ ├─ results/

│ │ ├─ decision-tree.png

│ │ ├─ gaussian-naive-bayes.png

│ │ ├─ logistic-regression.png

│ │ ├─ random-forest-classifier.png

│ ├─ architecture.png

│ ├─ workflow.png

│ ├─ wordcloud.png

│ └─ n-gram.png

├─ report.pdf

└─ presentation.pdf

✨ Key Features

🔤 NLP Pipeline: tokenization, lemmatization, POS tagging, stopword filtering
🔍 Topic Modeling with NMF – reveals semantic clusters
🧮 Multiple ML Classifiers
- Logistic Regression (≈95% accuracy)
- Decision Tree
- Random Forest
- Gaussian Naive Bayes
  (LR was the top performer.)
🎨 Rich EDA: histograms, distributions, n-grams, word clouds
⚙️ Complete Inference Pipeline: ready for deployment
🧪 Interpretability: TF-IDF feature importance, topic keywords
📚 Full research documentation in PDF

🏗 System Architecture

Architecture: Ingestion → Processing → Topic Modeling → ML Classifier → Prediction

🔄 Workflow

Model Workflow: Data → Clean → TF-IDF → NMF → Classifier → Results

📊 Exploratory Data Analysis

Word Cloud of Most Frequent Complaint Tokens

From the dataset EDA:

Strong presence of tokens like payment, credit, account, dispute, reporting
Complaints map naturally into financial service categories
Topic-token alignment validated using NMF factors
:contentReference[oaicite:3]{index=3}

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/kalpthakkar/JobPilot-AI.git
cd jobpilot_ai

2️⃣ Create Python Environment

python -m venv .venv_resolveai

# macOS/Linux
source .venv_resolveai/bin/activate


# Windows
.venv_resolveai\Scripts\activate
# Powershell
.venv_resolveai\Scripts\Activate.ps1

3️⃣ Run Notebook

jupyter lab src/notebook.ipynb

🧠 Example: End-to-End Model Pipeline

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

tfidf = TfidfVectorizer(max_df=0.95, min_df=2, ngram_range=(1,2))
X = tfidf.fit_transform(df['clean_complaint'])

nmf = NMF(n_components=5, random_state=42)
W = nmf.fit_transform(X)

y = W.argmax(axis=1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

print(classification_report(y_test, clf.predict(X_test)))

📈 Results Summary

Overall Model Performance

Model	Accuracy	Notes
Logistic Regression	⭐ ≈95%	Best performer
Decision Tree	~77%	Interpretable
Random Forest	~72–74%	Stable but overfit-prone
Gaussian NB	~36%	Weak baseline

✔ LR chosen as final classifier
✔ Well-separated NMF topics improved class clarity
✔ Topic-word alignment validated

🧩 Future Enhancements

Fine-tuned transformer models (GPT, BERT, DistilBERT)
Real-time ingestion with Kafka
FastAPI inference service
Active learning loop from agent feedback
Multi-modal complaint classification (voice → transcripts → NLP)

📚 Resources

📄 Full Research Report (PDF)

📑 Project Presentation (Slides)

💬 Frequently Asked Questions (FAQ)

Q: What dataset did you use?
A dataset of 78,313 customer complaints, with 22 columns, provided in JSON format.

Q: Why NMF for topic modeling?
NMF provides sparse, interpretable topics that align well with complaint categories.

Q: Why did Logistic Regression perform best?
High-dimensional TF-IDF vectors naturally favor linear decision boundaries.

Q: Can this model run in production?
Yes, TF-IDF + LR is fast, light, and easily containerizable.

Q: How is bias or PII handled?
Remove PII (names, IDs, emails) and evaluate demographic fairness before launch.

🧾 Citation / Acknowledgements

This Machine Learning work is developed by Kalp Thakkar. References and full experiments in report.pdf and slides

Bird, Klein & Loper - Natural Language Processing with Python (NLTK).
Pedregosa et al. - Scikit-learn.
Lee & Seung - NMF (Nature, 1999).

See full bibliography in this report

🤝 Contribution

Fork repository

https://github.com/kalpthakkar/ReSolveAI-AI-Complaint-Classification-Engine/fork

Create a branch:
```
git checkout -b feature-xyz
```

Commit your changes

# Stage changes
git add .

# Commit
git commit -m "Your message"

# Push the new branch
git push -u origin feature-xyz

Submit a Pull Request
```
gh pr create --fill
```

❤️ Final Note

ReSolveAI showcases how classical NLP + topic modeling + supervised ML can deliver real, measurable impact in customer complaint handling pipelines. This repository demonstrates research quality, engineering quality, and practical industry relevance - all in one project.

📞 Contact

For any inquiries or support, please contact:

Kalp Thakkar - kalpthakkar2001@gmail.com
GitHub: kalpthakkar
LinkedIn: kalpthakkar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReSolveAI - AI Complaint Classification Engine 🧭🤖

🔍 Overview

📁 Repository Structure

✨ Key Features

🏗 System Architecture

🔄 Workflow

📊 Exploratory Data Analysis

🚀 Getting Started

1️⃣ Clone the repository

2️⃣ Create Python Environment

3️⃣ Run Notebook

🧠 Example: End-to-End Model Pipeline

📈 Results Summary

🧩 Future Enhancements

📚 Resources

📄 Full Research Report (PDF)

📑 Project Presentation (Slides)

💬 Frequently Asked Questions (FAQ)

🧾 Citation / Acknowledgements

🤝 Contribution

❤️ Final Note

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 395 Commits
dataset		dataset
docs		docs
src		src
.gitattributes		.gitattributes
README.md		README.md
presentation.pdf		presentation.pdf
report.pdf		report.pdf

Folders and files

Latest commit

History

Repository files navigation

ReSolveAI - AI Complaint Classification Engine 🧭🤖

🔍 Overview

📁 Repository Structure

✨ Key Features

🏗 System Architecture

🔄 Workflow

📊 Exploratory Data Analysis

🚀 Getting Started

1️⃣ Clone the repository

2️⃣ Create Python Environment

3️⃣ Run Notebook

🧠 Example: End-to-End Model Pipeline

📈 Results Summary

🧩 Future Enhancements

📚 Resources

📄 Full Research Report (PDF)

📑 Project Presentation (Slides)

💬 Frequently Asked Questions (FAQ)

🧾 Citation / Acknowledgements

🤝 Contribution

❤️ Final Note

📞 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages