Automated Complaint Understanding for Financial Services
ReSolveAI transforms unstructured customer complaints into actionable, categorized insights using NLP, topic modeling, and supervised machine learning.
This project blends practical engineering with research-grade experimentation.๐ Full Report & Slides included (see Resources section).
ReSolveAI is an end-to-end AI engine that:
- Cleans and preprocesses raw complaint text
- Extracts meaningful patterns using TF-IDF
- Discovers hidden topics with Non-Negative Matrix Factorization (NMF)
- Uses these topics to train classification models
- Produces automated complaint category predictions with explainability
Why ReSolveAI matters:
- โก Faster resolution
- ๐ฏ Higher routing accuracy
- ๐ธ Reduced operational overhead
- ๐ Improved customer satisfaction
This project analyzed 78,313+ complaint records across 22 metadata fields, using label-aligned topic clusters from NMF.
:contentReference[oaicite:2]{index=2}
- ๐ค NLP Pipeline: tokenization, lemmatization, POS tagging, stopword filtering
- ๐ Topic Modeling with NMF โ reveals semantic clusters
- ๐งฎ Multiple ML Classifiers
- Logistic Regression (โ95% accuracy)
- Decision Tree
- Random Forest
- Gaussian Naive Bayes
(LR was the top performer.)
- ๐จ Rich EDA: histograms, distributions, n-grams, word clouds
- โ๏ธ Complete Inference Pipeline: ready for deployment
- ๐งช Interpretability: TF-IDF feature importance, topic keywords
- ๐ Full research documentation in PDF
From the dataset EDA:
- Strong presence of tokens like payment, credit, account, dispute, reporting
- Complaints map naturally into financial service categories
- Topic-token alignment validated using NMF factors
:contentReference[oaicite:3]{index=3}
git clone https://github.com/kalpthakkar/JobPilot-AI.git
cd jobpilot_aipython -m venv .venv_resolveai
# macOS/Linux
source .venv_resolveai/bin/activate
# Windows
.venv_resolveai\Scripts\activate
# Powershell
.venv_resolveai\Scripts\Activate.ps1jupyter lab src/notebook.ipynbfrom sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
tfidf = TfidfVectorizer(max_df=0.95, min_df=2, ngram_range=(1,2))
X = tfidf.fit_transform(df['clean_complaint'])
nmf = NMF(n_components=5, random_state=42)
W = nmf.fit_transform(X)
y = W.argmax(axis=1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
print(classification_report(y_test, clf.predict(X_test)))Overall Model Performance
| Model | Accuracy | Notes |
|---|---|---|
| Logistic Regression | โญ โ95% | Best performer |
| Decision Tree | ~77% | Interpretable |
| Random Forest | ~72โ74% | Stable but overfit-prone |
| Gaussian NB | ~36% | Weak baseline |
- โ LR chosen as final classifier
- โ Well-separated NMF topics improved class clarity
- โ Topic-word alignment validated
- Fine-tuned transformer models (GPT, BERT, DistilBERT)
- Real-time ingestion with Kafka
- FastAPI inference service
- Active learning loop from agent feedback
- Multi-modal complaint classification (voice โ transcripts โ NLP)
Q: What dataset did you use?
A dataset of 78,313 customer complaints, with 22 columns, provided in JSON format.
Q: Why NMF for topic modeling?
NMF provides sparse, interpretable topics that align well with complaint categories.
Q: Why did Logistic Regression perform best?
High-dimensional TF-IDF vectors naturally favor linear decision boundaries.
Q: Can this model run in production?
Yes, TF-IDF + LR is fast, light, and easily containerizable.
Q: How is bias or PII handled?
Remove PII (names, IDs, emails) and evaluate demographic fairness before launch.
This Machine Learning work is developed by Kalp Thakkar. References and full experiments in report.pdf and slides
Bird, Klein & Loper - Natural Language Processing with Python (NLTK).
Pedregosa et al. - Scikit-learn.
Lee & Seung - NMF (Nature, 1999).
See full bibliography in this report
-
Forkrepositoryhttps://github.com/kalpthakkar/ReSolveAI-AI-Complaint-Classification-Engine/fork
-
Create a
branch:git checkout -b feature-xyz
-
Commityour changes# Stage changes git add . # Commit git commit -m "Your message" # Push the new branch git push -u origin feature-xyz
-
Submit a
Pull Requestgh pr create --fill
ReSolveAI showcases how classical NLP + topic modeling + supervised ML can deliver real, measurable impact in customer complaint handling pipelines. This repository demonstrates research quality, engineering quality, and practical industry relevance - all in one project.
For any inquiries or support, please contact:
- Kalp Thakkar - kalpthakkar2001@gmail.com
- GitHub: kalpthakkar
- LinkedIn: kalpthakkar


