This project was initiated to fulfill my requirements for graduation of Udacity's Machine Learning Nanodegree program.
I delved into my thoughts and reasoning behind my approach and decisions for this project in the report.pdf or on my blog posts part 1 and part 2.
This is the dataset that the project will be using. It contains only one file - vehicle.csv. It is included in the repository and can be pulled with Git LFS installed or you can opt to download it on Kaggle.
The linearlearner_model.ipynb and xgboost_model.ipynb notebooks need to be run on an Amazon SageMaker notebook instance in order to run properly as these two notebooks use SageMaker's built-in algorithms and tools.
All other notebooks should run fine as long as the below packages are installed:
- numpy
- pandas
- matplotlib
- seaborn
- scikit-learn
- imbalanced-learn