A production-grade Django-based web app that predicts whether a user is Diabetic, Pre-Diabetic, or Non-Diabetic based on healthcare and lifestyle inputs. The backend is powered by an optimized Ensemble Machine Learning Model combining CatBoost, XGBoost, and LightGBM, trained on a large real-world medical dataset.
Source: CDC Diabetes Health Indicators Dataset (UCI)
- Records: 253,680 entries
- Features: 35 total (22 selected for prediction)
- Target Variable:
Diabetes_012- 0 = No Diabetes
- 1 = Pre-Diabetes
- 2 = Diabetes
To provide an accessible and intelligent diabetes screening tool for the general public that can:
- Collect user health and lifestyle indicators through a simple web interface
- Predict the likelihood of having diabetes using an advanced ML ensemble
- Support decision-making with high accuracy and transparency
- π§ ML Ensemble Model using:
CatBoostClassifierXGBClassifierLGBMClassifier
- βοΈ Ensemble Weighting
- CatBoost:
0.780 - XGBoost:
0.102 - LightGBM:
0.118
- CatBoost:
- π Threshold Calibration
- Pre-Diabetes Threshold:
0.001 - Diabetes Threshold:
0.050
- Pre-Diabetes Threshold:
- π Performance
- Accuracy:
89.6% - ROC-AUC:
0.98 - F1-score (Class 2 - Diabetes):
0.95
- Accuracy:
- π₯οΈ User Interface
- Step-by-step health questionnaire
- Secure user login/signup
- Final diagnosis view:
No Diabetes,Pre-Diabetes, orDiabetes
| Area | Tools / Frameworks |
|---|---|
| Web Framework | Django 5.1.3 |
| ML Models | CatBoost, XGBoost, LightGBM |
| Preprocessing | scikit-learn, imbalanced-learn, SMOTE, StandardScaler |
| Hyperparameter Tuning | Optuna |
| Dataset | CDC Diabetes Health Indicators Dataset (UCI) |
| Language | Python 3.11 |
For feedback, suggestions, or collaboration:
- π€ Syed Saad Ali
- π§ Email: syedsaadi427@gmail.com
- π GitHub: @syedsaadali11







