Thyroid disease is a prevalent medical condition characterized by abnormalities in thyroid hormone levels, which regulate the body's metabolism. This project aims to predict whether a patient is at risk for thyroid disease using a Random Forest Classifier, with the goal of improving early diagnosis and treatment.
- Predicts the likelihood of thyroid disease (Normal, Hyperthyroidism, or Hypothyroidism).
- Built using a Kaggle dataset.
- High-performance model with metrics demonstrating excellent accuracy.
- Deployed as a web application using Streamlit.
- Source: Kaggle Thyroid Disease Dataset.
- Features:
- Various medical attributes related to thyroid function and patient history.
- Target: Classification into one of three classes: Normal (0), Hyperthyroidism (1), Hypothyroidism (2).
The project uses the Random Forest Classifier to predict thyroid conditions. Key steps include:
- Data Preprocessing:
- Handling missing values.
- Encoding categorical variables.
- Scaling numerical features.
- Feature Selection:
- Selected features based on importance to the model.
- Model Training:
- Used a Random Forest Classifier with hyperparameter tuning.
- Evaluation:
- Evaluated model using various performance metrics.
The model achieved exceptional results:
- Accuracy: 98.44%
- Precision (weighted): 98.43%
- Recall (weighted): 98.44%
- F1 Score (weighted): 98.39%
- ROC-AUC Score: 99.84%
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Normal (0) | 99% | 99% | 99% | 1350 |
Hyperthyroidism (1) | 96% | 78% | 86% | 63 |
Hypothyroidism (2) | 95% | 98% | 96% | 124 |
The model is deployed as a Streamlit web application. Users can upload patient data in CSV format and get predictions directly on the web.
- Visit the deployed app.
- Upload a CSV file containing patient data.
- View predictions and insights instantly.
- Enhancing Recall for Hyperthyroidism:
- Employ techniques like SMOTE or ADASYN to balance the dataset.
- Fine-tune hyperparameters for better recall.
- Model Explainability:
- Integrate SHAP or LIME for better interpretability of feature importance.
- Additional Features:
- Incorporate more patient demographics and medical history.
- Python 3.7+
- Required libraries (install using
requirements.txt
):pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/vivekd16/Thyroid-Disease-Detection.git
- Navigate to the project directory:
cd Thyroid-Disease-Detection
- Start the Streamlit app:
streamlit run app.py
└── Thyroid-Disease-Detection/
├── README.md
├── LICENSE
├── app.py
├── requirements.txt
├── dataset/
│ ├── processed/
│ │ └── ProcessedthyroidDF.csv
│ └── raw/
│ └── thyroidDF.csv
├── model/
│ └── thyroid_disease_model.pkl
└── sources/
└── thyroid_disease_detection.ipynb