Skip to content

rohith-97/ML-Mid_Term

Repository files navigation

Heart Disease Prediction

Lightweight end-to-end example that trains and serves a Random Forest model to predict heart disease from tabular patient data.

This repository includes:

  • Training script (train.py) that performs K-Fold validation and saves a model + encoder.
  • A pre-trained model file (rf_model_40_trees_depth_10_min_samples_leaf_1.bin).
  • A Flask-based prediction endpoint (predict.py) and a small test client (predict_test.py).

Main parts (summary)

  • Dataset: Data/heart.csv (patient features + HeartDisease target).
  • Model: RandomForestClassifier with saved One-Hot encoder (DictVectorizer-style) and classifier stored together in a .bin file.
  • API: POST /predict accepts a single patient JSON and returns heart_disease_probability and heart_disease (bool).

Quickstart (minimal)

  1. Use Docker (recommended, ensures correct Python and deps):
docker build -t heart-predict:latest .
docker run -p 9696:9696 heart-predict:latest
  1. Test the running server (from another shell):
python predict_test.py

Or use curl with a JSON body (example below).

Installation (local, non-Docker)

The project uses Pipfile (Python 3.12). To install with pipenv:

pip install pipenv
pipenv install --deploy --system

Alternatively create a virtualenv and install dependencies derived from the Pipfile.

Start app locally:

gunicorn --bind=0.0.0.0:9696 predict:app

Then run python predict_test.py to send a sample request.

Training

Run full training and save a new model file:

python train.py

What happens:

  • Reads Data/heart.csv and auto-detects categorical vs numerical columns.
  • Runs K-Fold cross-validation and prints per-fold accuracy.
  • Retrains on the full training set and writes the encoder + model to a .bin file.

Files and structure

  • Data/heart.csv — dataset.
  • train.py — training + CV script.
  • predict.py — Flask app (loads .bin file and serves /predict).
  • predict_test.py — example client that POSTs a sample patient.
  • rf_model_40_trees_depth_10_min_samples_leaf_1.bin — included saved model.
  • Dockerfile, Pipfile, Pipfile.lock — runtime and packaging.

Notes & guidance

  • Ensure JSON keys and categorical values you send to /predict are identical (names & categories) to those used during training — otherwise the encoder may produce different feature vectors or raise an error.
  • train.py contains a small LabelEncoder snippet that is unused in the training flow; if you modify preprocessing, make it explicit and keep it consistent between training and inference.
  • There are no automated tests; consider adding a small unit test that loads the .bin and checks prediction output types/shape.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages