DATA-245 Machine Learning Group Project
This project investigates how food access inequality correlates with community-level health outcomes (obesity, diabetes) across 2,275 US counties.
ML group project/
├── data/
│ ├── raw/ # Original source data
│ ├── processed/ # Cleaned datasets
│ └── output/ # Analysis results (CSV, PNG)
├── notebooks/ # Jupyter notebooks for analysis
├── src/
│ ├── analysis/ # Core analysis scripts
│ └── utils/ # Utility modules
├── dashboards/ # Streamlit applications
├── docs/
│ ├── reports/ # PDF documentation
│ └── images/ # Charts and diagrams
├── requirements.txt # Python dependencies
└── README.md
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Regression analysis
python src/analysis/Regression_analysis.py
# PCA analysis
python src/analysis/pcaanalysis.pycd dashboards
streamlit run main_dashboard.pyThe dashboard includes 8 pages:
- Project Overview
- Data Exploration
- Regression Analysis (Linear, Ridge, Lasso)
- Classification Models (Logistic, KNN, Naive Bayes, SVM, Decision Tree, Random Forest, Extra Trees)
- Clustering Analysis (K-Means, Hierarchical)
- PCA Analysis
- Model Comparison
- Key Insights & Conclusions
jupyter notebook notebooks/Key notebooks:
Food_Desert_Data_Cleaning.ipynb- Data preprocessing pipelineGoogle_Colab_K_Means.ipynb- Clustering analysisRegression_Modeling_Diabetes_Obesity.ipynb- Regression modeling
- Regression: OLS, Ridge, Lasso
- Clustering: K-Means (5 clusters)
- Dimensionality Reduction: PCA
- Source: 2025 County Health Rankings
- Size: 2,275 counties across 48 states
- Target Variables: Adult obesity rate, diabetes rate
- Features: Food environment, socioeconomic factors, education, rurality
DATA-245 Group 3:
- Savitha Vijayarangan - Project Coordination
- Jane Heng - Regression Lead
- Rishi Visweswar Boppana - PCA Lead
- Kapil Reddy Sanikommu - Dashboard Lead