Skip to content

kalpthakkar/ReSolveAI-AI-Complaint-Classification-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

395 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ReSolveAI - AI Complaint Classification Engine ๐Ÿงญ๐Ÿค–

Project Status Python License

Automated Complaint Understanding for Financial Services
ReSolveAI transforms unstructured customer complaints into actionable, categorized insights using NLP, topic modeling, and supervised machine learning.
This project blends practical engineering with research-grade experimentation.

๐Ÿ“„ Full Report & Slides included (see Resources section).

๐Ÿ” Overview

ReSolveAI is an end-to-end AI engine that:

  • Cleans and preprocesses raw complaint text
  • Extracts meaningful patterns using TF-IDF
  • Discovers hidden topics with Non-Negative Matrix Factorization (NMF)
  • Uses these topics to train classification models
  • Produces automated complaint category predictions with explainability

Why ReSolveAI matters:

  • โšก Faster resolution
  • ๐ŸŽฏ Higher routing accuracy
  • ๐Ÿ’ธ Reduced operational overhead
  • ๐Ÿ˜Š Improved customer satisfaction

This project analyzed 78,313+ complaint records across 22 metadata fields, using label-aligned topic clusters from NMF.
:contentReference[oaicite:2]{index=2}


๐Ÿ“ Repository Structure

โ”œโ”€ README.md
โ”œโ”€ src/
โ”‚ โ”œโ”€ notebook.ipynb
โ”œโ”€ dataset/
โ”‚ โ”œโ”€ complaints-2021-05-14_08_16.json
โ”œโ”€ docs/
โ”‚ โ”œโ”€ results/
โ”‚ โ”‚ โ”œโ”€ decision-tree.png
โ”‚ โ”‚ โ”œโ”€ gaussian-naive-bayes.png
โ”‚ โ”‚ โ”œโ”€ logistic-regression.png
โ”‚ โ”‚ โ”œโ”€ random-forest-classifier.png
โ”‚ โ”œโ”€ architecture.png
โ”‚ โ”œโ”€ workflow.png
โ”‚ โ”œโ”€ wordcloud.png
โ”‚ โ””โ”€ n-gram.png
โ”œโ”€ report.pdf
โ””โ”€ presentation.pdf

โœจ Key Features

  • ๐Ÿ”ค NLP Pipeline: tokenization, lemmatization, POS tagging, stopword filtering
  • ๐Ÿ” Topic Modeling with NMF โ€“ reveals semantic clusters
  • ๐Ÿงฎ Multiple ML Classifiers
    • Logistic Regression (โ‰ˆ95% accuracy)
    • Decision Tree
    • Random Forest
    • Gaussian Naive Bayes
      (LR was the top performer.)
  • ๐ŸŽจ Rich EDA: histograms, distributions, n-grams, word clouds
  • โš™๏ธ Complete Inference Pipeline: ready for deployment
  • ๐Ÿงช Interpretability: TF-IDF feature importance, topic keywords
  • ๐Ÿ“š Full research documentation in PDF

๐Ÿ— System Architecture

Architecture Diagram

Architecture: Ingestion โ†’ Processing โ†’ Topic Modeling โ†’ ML Classifier โ†’ Prediction


๐Ÿ”„ Workflow

Workflow Diagram

Model Workflow: Data โ†’ Clean โ†’ TF-IDF โ†’ NMF โ†’ Classifier โ†’ Results


๐Ÿ“Š Exploratory Data Analysis

Word Cloud

Word Cloud of Most Frequent Complaint Tokens

From the dataset EDA:

  • Strong presence of tokens like payment, credit, account, dispute, reporting
  • Complaints map naturally into financial service categories
  • Topic-token alignment validated using NMF factors
    :contentReference[oaicite:3]{index=3}

๐Ÿš€ Getting Started

1๏ธโƒฃ Clone the repository

git clone https://github.com/kalpthakkar/JobPilot-AI.git
cd jobpilot_ai

2๏ธโƒฃ Create Python Environment

python -m venv .venv_resolveai

# macOS/Linux
source .venv_resolveai/bin/activate


# Windows
.venv_resolveai\Scripts\activate
# Powershell
.venv_resolveai\Scripts\Activate.ps1

3๏ธโƒฃ Run Notebook

jupyter lab src/notebook.ipynb

๐Ÿง  Example: End-to-End Model Pipeline

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

tfidf = TfidfVectorizer(max_df=0.95, min_df=2, ngram_range=(1,2))
X = tfidf.fit_transform(df['clean_complaint'])

nmf = NMF(n_components=5, random_state=42)
W = nmf.fit_transform(X)

y = W.argmax(axis=1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

print(classification_report(y_test, clf.predict(X_test)))

๐Ÿ“ˆ Results Summary

Overall Model Performance

Model Accuracy Notes
Logistic Regression โญ โ‰ˆ95% Best performer
Decision Tree ~77% Interpretable
Random Forest ~72โ€“74% Stable but overfit-prone
Gaussian NB ~36% Weak baseline
  • โœ” LR chosen as final classifier
  • โœ” Well-separated NMF topics improved class clarity
  • โœ” Topic-word alignment validated

๐Ÿงฉ Future Enhancements

  • Fine-tuned transformer models (GPT, BERT, DistilBERT)
  • Real-time ingestion with Kafka
  • FastAPI inference service
  • Active learning loop from agent feedback
  • Multi-modal complaint classification (voice โ†’ transcripts โ†’ NLP)

๐Ÿ“š Resources

๐Ÿ“„ Full Research Report (PDF)

๐Ÿ“‘ Project Presentation (Slides)

๐Ÿ’ฌ Frequently Asked Questions (FAQ)

Q: What dataset did you use?
A dataset of 78,313 customer complaints, with 22 columns, provided in JSON format.

Q: Why NMF for topic modeling?
NMF provides sparse, interpretable topics that align well with complaint categories.

Q: Why did Logistic Regression perform best?
High-dimensional TF-IDF vectors naturally favor linear decision boundaries.

Q: Can this model run in production?
Yes, TF-IDF + LR is fast, light, and easily containerizable.

Q: How is bias or PII handled?
Remove PII (names, IDs, emails) and evaluate demographic fairness before launch.

๐Ÿงพ Citation / Acknowledgements

This Machine Learning work is developed by Kalp Thakkar. References and full experiments in report.pdf and slides

Bird, Klein & Loper - Natural Language Processing with Python (NLTK).
Pedregosa et al. - Scikit-learn.
Lee & Seung - NMF (Nature, 1999).

See full bibliography in this report

๐Ÿค Contribution

  • Fork repository

    https://github.com/kalpthakkar/ReSolveAI-AI-Complaint-Classification-Engine/fork
  • Create a branch:

    git checkout -b feature-xyz
  • Commit your changes

    # Stage changes
    git add .
    
    # Commit
    git commit -m "Your message"
    
    # Push the new branch
    git push -u origin feature-xyz
  • Submit a Pull Request

    gh pr create --fill

โค๏ธ Final Note

ReSolveAI showcases how classical NLP + topic modeling + supervised ML can deliver real, measurable impact in customer complaint handling pipelines. This repository demonstrates research quality, engineering quality, and practical industry relevance - all in one project.


๐Ÿ“ž Contact

For any inquiries or support, please contact:

About

ReSolveAI is an NLP-powered complaint classification engine that automatically analyzes and categorizes customer issues with high accuracy. Built with modular ML pipelines for clean preprocessing, training, evaluation, and deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors