Food Desert Effect on County-Level Health Outcomes

DATA-245 Machine Learning Group Project

Overview

This project investigates how food access inequality correlates with community-level health outcomes (obesity, diabetes) across 2,275 US counties.

Project Structure

ML group project/
├── data/
│   ├── raw/                    # Original source data
│   ├── processed/              # Cleaned datasets
│   └── output/                 # Analysis results (CSV, PNG)
├── notebooks/                  # Jupyter notebooks for analysis
├── src/
│   ├── analysis/               # Core analysis scripts
│   └── utils/                  # Utility modules
├── dashboards/                 # Streamlit applications
├── docs/
│   ├── reports/                # PDF documentation
│   └── images/                 # Charts and diagrams
├── requirements.txt            # Python dependencies
└── README.md

Installation

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Usage

Run Analysis Scripts

# Regression analysis
python src/analysis/Regression_analysis.py

# PCA analysis
python src/analysis/pcaanalysis.py

Launch Main Dashboard

cd dashboards
streamlit run main_dashboard.py

The dashboard includes 8 pages:

Project Overview
Data Exploration
Regression Analysis (Linear, Ridge, Lasso)
Classification Models (Logistic, KNN, Naive Bayes, SVM, Decision Tree, Random Forest, Extra Trees)
Clustering Analysis (K-Means, Hierarchical)
PCA Analysis
Model Comparison
Key Insights & Conclusions

Jupyter Notebooks

jupyter notebook notebooks/

Key notebooks:

Food_Desert_Data_Cleaning.ipynb - Data preprocessing pipeline
Google_Colab_K_Means.ipynb - Clustering analysis
Regression_Modeling_Diabetes_Obesity.ipynb - Regression modeling

ML Methods

Regression: OLS, Ridge, Lasso
Clustering: K-Means (5 clusters)
Dimensionality Reduction: PCA

Data

Source: 2025 County Health Rankings
Size: 2,275 counties across 48 states
Target Variables: Adult obesity rate, diabetes rate
Features: Food environment, socioeconomic factors, education, rurality

Team

DATA-245 Group 3:

Savitha Vijayarangan - Project Coordination
Jane Heng - Regression Lead
Rishi Visweswar Boppana - PCA Lead
Kapil Reddy Sanikommu - Dashboard Lead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Food Desert Effect on County-Level Health Outcomes

Overview

Project Structure

Installation

Usage

Run Analysis Scripts

Launch Main Dashboard

Jupyter Notebooks

ML Methods

Data

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
archive		archive
dashboards		dashboards
data		data
docs		docs
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
245-project-janeheng.ipynb		245-project-janeheng.ipynb
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Food Desert Effect on County-Level Health Outcomes

Overview

Project Structure

Installation

Usage

Run Analysis Scripts

Launch Main Dashboard

Jupyter Notebooks

ML Methods

Data

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages