Life Expectancy ML Prediction

A machine learning project that predicts global life expectancy using World Bank development indicators. Three regression models are trained, evaluated, and compared to identify the best predictor.

Overview

Using publicly available World Bank data, this project builds a complete ML pipeline — from raw data ingestion and reshaping through EDA, feature engineering, model training, and evaluation. The target variable is life expectancy at birth (years).

Dataset

Source: World Bank World Development Indicators

Features used:

Indicator	Description
Mortality rate, adult, female	Per 1,000 female adults
Mortality rate, adult, male	Per 1,000 male adults
Mortality rate, infant	Per 1,000 live births
Mortality rate, under-5	Per 1,000 live births
Diabetes prevalence	% of population ages 20–79
Physicians	Per 1,000 people
Hospital beds	Per 1,000 people
Health expenditure	% of GDP
GDP per capita	Current US$
School enrollment, secondary	% gross

Features

Reshape World Bank wide-format data (years as columns) into long format
Exploratory data analysis — histograms, box plots, correlation heatmaps, pairplots
Clean pipeline: missing value handling, feature scaling (StandardScaler)
Three models trained and compared:
- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
Model evaluation with MAE, RMSE, and R²
Feature importance analysis from Random Forest

Project Structure

life_expectancy_prediction.ipynb   # Main Jupyter notebook
data.csv                           # Raw World Bank dataset
metadata.csv                       # Dataset column metadata
life_expectancy_ml_report.pdf      # Full written report

Installation

Clone the repository:

git clone https://github.com/Sparkz691768/Life-Expectancy-ML-Prediction.git
cd Life-Expectancy-ML-Prediction

Install dependencies:

pip install pandas numpy matplotlib seaborn scikit-learn jupyter

Launch the notebook:

jupyter notebook life_expectancy_prediction.ipynb

Models & Results

Model	MAE	RMSE	R²
Linear Regression	—	—	—
Random Forest	—	—	—
Gradient Boosting	—	—	—

Run the notebook to populate actual metrics.

What I Learned

Working with real-world open data from the World Bank API
Reshaping wide tabular data (pivot/melt) for ML use
Comparing multiple regression models on the same dataset
Interpreting feature importance to understand which indicators drive predictions
Building a complete end-to-end ML pipeline

Future Improvements

Hyperparameter tuning with GridSearchCV / RandomizedSearchCV
Add more recent data years via World Bank API
Build an interactive prediction dashboard with Streamlit
Experiment with XGBoost and neural network regressors
Add cross-validation for more robust evaluation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Life Expectancy ML Prediction

Overview

Dataset

Features

Project Structure

Installation

Models & Results

What I Learned

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
data.csv		data.csv
life_expectancy_ml_report.pdf		life_expectancy_ml_report.pdf
life_expectancy_prediction.ipynb		life_expectancy_prediction.ipynb
metadata.csv		metadata.csv

Folders and files

Latest commit

History

Repository files navigation

Life Expectancy ML Prediction

Overview

Dataset

Features

Project Structure

Installation

Models & Results

What I Learned

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages