- Project Overview
- Technologies Used
- GitHub Versioning/Workflow
- Data Exploration
- Part I: Descriptive Statistics
- Part II: Inferential Statistics
- Conclusion
- Contact
- Acknowledgements
This project provides a comprehensive analysis of medical insurance costs, divided into two parts. Part I focuses on descriptive statistics, while Part II delves into inferential statistics. The analysis includes data exploration, data visualization, outlier detection, and predictive modeling.
Python | Pandas | Matplotlib | Seaborn | NumPy | Statsmodels
This project utilizes GitHub for version control and workflow management, facilitating collaboration and ensuring code integrity.
The dataset contains 1,338 entries and includes the following variables:
- Age
- BMI (Body Mass Index)
- Medical Insurance Charges
- Smoker Status
- Number of Children
- Region
- Positive Correlation: A positive correlation exists between BMI and medical insurance charges.
- Impact of Smoking: Smoking significantly affects medical insurance charges.
- Age Groups: Medical insurance charges tend to be higher for older individuals.
- Outliers: Detected and handled outliers in medical insurance charges.
- Scatter Plots
- Box Plots
- Histograms
- Dual Axes Plotting: How to plot multiple variables on different axes.
- Correlation and Heatmaps: Interpreting correlation coefficients and understanding heatmaps.
- Categorical Conversion: Converting categorical columns into numerical outputs for modeling.
- Linear Regression Modeling: Creating and assessing the goodness of fit for linear regression models using the statsmodels.api library.
- Multiple Regression Model: Employed the statsmodels.api library to perform Ordinary Least Squares Regression.
- Coefficient Analysis: Analyzed the coefficients to interpret the model.
- Model Validation: Validated the predictions to assess model performance.
The analysis provides a comprehensive overview of factors affecting medical insurance charges and could be useful for insurance companies for risk assessment or for individuals to understand key factors affecting their medical insurance costs.
For more details, please contact Christine Baxter.
Data was sourced from Springboard's Data Analyst Career Track curriculum.