Heart Disease Prediction

Lightweight end-to-end example that trains and serves a Random Forest model to predict heart disease from tabular patient data.

This repository includes:

Training script (train.py) that performs K-Fold validation and saves a model + encoder.
A pre-trained model file (rf_model_40_trees_depth_10_min_samples_leaf_1.bin).
A Flask-based prediction endpoint (predict.py) and a small test client (predict_test.py).

Main parts (summary)

Dataset: Data/heart.csv (patient features + HeartDisease target).
Model: RandomForestClassifier with saved One-Hot encoder (DictVectorizer-style) and classifier stored together in a .bin file.
API: POST /predict accepts a single patient JSON and returns heart_disease_probability and heart_disease (bool).

Quickstart (minimal)

Use Docker (recommended, ensures correct Python and deps):

docker build -t heart-predict:latest .
docker run -p 9696:9696 heart-predict:latest

Test the running server (from another shell):

python predict_test.py

Or use curl with a JSON body (example below).

Installation (local, non-Docker)

The project uses Pipfile (Python 3.12). To install with pipenv:

pip install pipenv
pipenv install --deploy --system

Alternatively create a virtualenv and install dependencies derived from the Pipfile.

Start app locally:

gunicorn --bind=0.0.0.0:9696 predict:app

Then run python predict_test.py to send a sample request.

Training

Run full training and save a new model file:

python train.py

What happens:

Reads Data/heart.csv and auto-detects categorical vs numerical columns.
Runs K-Fold cross-validation and prints per-fold accuracy.
Retrains on the full training set and writes the encoder + model to a .bin file.

Files and structure

Data/heart.csv — dataset.
train.py — training + CV script.
predict.py — Flask app (loads .bin file and serves /predict).
predict_test.py — example client that POSTs a sample patient.
rf_model_40_trees_depth_10_min_samples_leaf_1.bin — included saved model.
Dockerfile, Pipfile, Pipfile.lock — runtime and packaging.

Notes & guidance

Ensure JSON keys and categorical values you send to /predict are identical (names & categories) to those used during training — otherwise the encoder may produce different feature vectors or raise an error.
train.py contains a small LabelEncoder snippet that is unused in the training flow; if you modify preprocessing, make it explicit and keep it consistent between training and inference.
There are no automated tests; consider adding a small unit test that loads the .bin and checks prediction output types/shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Prediction

Main parts (summary)

Quickstart (minimal)

Installation (local, non-Docker)

Training

Files and structure

Notes & guidance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
heart_notebook.ipynb		heart_notebook.ipynb
predict.py		predict.py
predict_test.py		predict_test.py
rf_model_40_trees_depth_10_min_samples_leaf_1.bin		rf_model_40_trees_depth_10_min_samples_leaf_1.bin
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Prediction

Main parts (summary)

Quickstart (minimal)

Installation (local, non-Docker)

Training

Files and structure

Notes & guidance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages