Skip to content

christinedbaxter/sb_caseStudy_FamInsureCo

Repository files navigation

Medical Insurance Cost Analysis

Table of Contents

  1. Project Overview
  2. Technologies Used
  3. GitHub Versioning/Workflow
  4. Data Exploration
  5. Part I: Descriptive Statistics
  6. Part II: Inferential Statistics
  7. Conclusion
  8. Contact
  9. Acknowledgements

Project Overview

This project provides a comprehensive analysis of medical insurance costs, divided into two parts. Part I focuses on descriptive statistics, while Part II delves into inferential statistics. The analysis includes data exploration, data visualization, outlier detection, and predictive modeling.

Back to Top

Technologies Used

Python  |  Pandas  |  Matplotlib  |  Seaborn  |  NumPy  |  Statsmodels

Back to Top

GitHub Versioning/Workflow

This project utilizes GitHub for version control and workflow management, facilitating collaboration and ensuring code integrity.

Back to Top

Data Exploration

The dataset contains 1,338 entries and includes the following variables:

  • Age
  • BMI (Body Mass Index)
  • Medical Insurance Charges
  • Smoker Status
  • Number of Children
  • Region

Back to Top

Part I: Descriptive Statistics

Key Insights

  1. Positive Correlation: A positive correlation exists between BMI and medical insurance charges.
  2. Impact of Smoking: Smoking significantly affects medical insurance charges.
  3. Age Groups: Medical insurance charges tend to be higher for older individuals.
  4. Outliers: Detected and handled outliers in medical insurance charges.

Visualizations

  1. Scatter Plots
  2. Box Plots
  3. Histograms

Back to Top

Part II: Inferential Statistics

Topics Covered

  1. Dual Axes Plotting: How to plot multiple variables on different axes.
  2. Correlation and Heatmaps: Interpreting correlation coefficients and understanding heatmaps.
  3. Categorical Conversion: Converting categorical columns into numerical outputs for modeling.
  4. Linear Regression Modeling: Creating and assessing the goodness of fit for linear regression models using the statsmodels.api library.

Predictive Modeling

  1. Multiple Regression Model: Employed the statsmodels.api library to perform Ordinary Least Squares Regression.
  2. Coefficient Analysis: Analyzed the coefficients to interpret the model.
  3. Model Validation: Validated the predictions to assess model performance.

Back to Top

Conclusion

The analysis provides a comprehensive overview of factors affecting medical insurance charges and could be useful for insurance companies for risk assessment or for individuals to understand key factors affecting their medical insurance costs.

Back to Top

Contact

For more details, please contact Christine Baxter.

Back to Top

Acknowledgements

Data was sourced from Springboard's Data Analyst Career Track curriculum.

Back to Top