This project focuses on predicting flight delays using a robust machine learning pipeline implemented with PySpark. The goal is to provide accurate predictions by optimizing model performance and aligning evaluation metrics with business objectives.
- ML Pipeline: Built and deployed a flight delay prediction model using PySpark.
- Hyperparameter Tuning: Used Hyperopt for systematic tuning to achieve optimal model performance.
- Performance Metrics: Achieved an F-beta score of 0.9, balancing precision and recall effectively.
Flight-Prediction Phase 3 - (Final Report).ipynb
: The final report on the model and its performance.Phase 3 - HyperOpt.ipynb
: The notebook demonstrating hyperparameter tuning with Hyperopt.Phase 3 - Modeling - best.ipynb
: The notebook containing the best-performing model.Team 6-2 Presentation (Final).pdf
: Final presentation slides summarizing project outcomes.
- Clone this repository.
- Install required dependencies via
pip install -r requirements.txt
. - Run the notebooks to explore data processing, model training, and evaluation.
This project is licensed under the MIT License.
Let me know if you'd like any changes!