This project was the onboarding assement for each member of the data analytics team. The aim of this project was to assess one's strengths and weaknesses on data analysis.
This project is about predcting the factors that influence stroke. The training dataset contains a total of 43400 entries with 12 columns, 10 independent variables(age, gender, work_type, Residence_type, avg_glucose_level, hypertension, heart disease, ever_married, bmi), smoking_status) and 1 target variable(stroke).
The aim was to predict the factors influencing stroke.
-
Python
-
Jupyter Notebook
-
The project used the following packages:
- pandas (import pandas as pd)
- numpy (import numpy as np)
- seaborn (import seaborn as sns)
- matplotlib (import matplotlib.pyplot as plt)
- Scikit-learn
After training the data using a Random Forest Classifier Algorithm, I determined the the most important features. Then, I predicted the target variable(stroke) for the test data using the model built.