Skip to content

samarthchandrawat/CVD_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Cardiovascular Diseases Risk Prediction

Heart disease is a major global health concern. This project aims to leverage data science and machine learning techniques to predict the risk of cardiovascular diseases based on various attributes such as exercise habits, diet, medical history, and lifestyle factors. By improving early detection and providing actionable insights, this project strives to enhance healthcare outcomes.

Table of Contents

  1. Introduction
  2. Dataset
  3. Project Workflow
  4. Models and Techniques Used
  5. Results
  6. Getting Started
  7. How to Contribute
  8. License

Introduction

This project explores the prediction of cardiovascular disease risk using a dataset of health-related attributes. By analyzing risk factors such as BMI, exercise habits, smoking history, and dietary patterns, we aim to develop predictive models and provide interpretability for actionable insights.


Dataset

The dataset is sourced from Kaggle. It contains 308,774 rows and 19 attributes, including:

  • General health
  • Checkup frequency
  • Exercise habits
  • Medical history (diabetes, skin cancer, etc.)
  • BMI, height, and weight
  • Dietary habits

Data Preprocessing

  • Removed duplicates (80 rows).
  • Encoded categorical data (e.g., yes/no replaced with 1/0).
  • Applied outlier detection and removal using the IQR method.
  • Balanced the dataset using SMOTE for minority oversampling.

Project Workflow

The project follows this structured workflow:

  1. Dataset Exploration: Univariate, bivariate, and correlation analysis.
  2. Data Preprocessing: Cleaning, feature encoding, outlier handling, and oversampling.
  3. Feature Engineering: Identified important features for heart disease prediction.
  4. Model Development:
    • Logistic Regression
    • Decision Tree Classifier
    • Random Forest Classifier
    • Artificial Neural Network
  5. Evaluation: Performance metrics (accuracy, AUC-ROC) and model interpretability.

Models and Techniques Used

  1. Baseline Model: Logistic Regression

    • Accuracy: 82%
    • Key Insights: Transparent and interpretable coefficients.
  2. Decision Tree Classifier

    • Accuracy: 88%
    • Advantages: Non-linear relationships and feature importance.
  3. Random Forest Classifier

    • Accuracy: 93%
    • Advantages: Reduced overfitting and robust performance.
  4. Artificial Neural Network

    • Accuracy: 83%
    • Advantages: Handles complex relationships and latent patterns.

Key Visualizations

  • Correlation heatmaps
  • Top 10 features by importance
  • AUC-ROC curves
  • Distribution plots for key variables like exercise, BMI, and smoking history

Results

  • Best Performing Model: Random Forest Classifier with an accuracy of 93%.
  • Interpretability: Decision Tree and Logistic Regression provided clear insights into key risk factors.
  • Key Risk Factors Identified:
    • Age
    • Diabetes
    • Smoking history
    • BMI
    • Exercise habits

Getting Started

Prerequisites

  • Python 3.7 or later
  • Required libraries: pandas, numpy, sklearn, seaborn, matplotlib, plotly, imblearn, tensorflow, sqlalchemy, pandasql

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/cvd-risk-prediction.git
    cd cvd-risk-prediction
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the project:
    python big_data_final_project.py

How to Contribute

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a feature branch:
    git checkout -b feature-name
  3. Commit your changes:
    git commit -m "Add feature-name"
  4. Push to the branch:
    git push origin feature-name
  5. Create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Feel free to customize this template based on your specific project requirements or to include additional sections, such as "Challenges Faced" or "Future Work."

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published