Solving Kaggle's 'Bike Sharing Demand' competition as a final project for
Naya college's Data Science Professional course instructed by Amit Rappel 👍.
Project group members: Omer Vinik , Yechiel Hickry
- Distinguish casual from registered customer by creating many custom graphs and plots:
- Showing registered vs. casual rentals across day of the week
- Showing registered vs. casual rentals across hour of the day
- Showing average users count by weekday & hour of the day across user type
- Showing registered vs. casual users count by hour of the day across weekdays
- Showing registered vs. casual rentals by working day
- Showing registered vs. casual rentals by season
- Showing average temperatures by season
- Showing registered vs. casual users count by hour of the day across seasons
- Showing registered vs. casual rentals by weather
- Histograming the data and different targets
- Creating correlation analysis of the weather features
- Removing outliers from the weather features
- Extracting features from the Datetime column: Year, Month, Quater, Weekday, Hour
- Engineering new features from the Datetime column: isweekend, peak_hour, afternoon
- Engineering a new feature from the weather column: good_weather
- Scaling of 3 weather columns using StandardScaler
- Applying log(x+1) on the target columns - registered & casual counts
- Creating 2 different models for registered and casual customers
- Using 2 pipelines with custom transformers:
- Add custom detail date and weather related columns transformer
- Standard Scaler Transformer
- Column Copier Transformer
- Drop Column Transformer
- Binned Data Digitizer Transformer
- Trying out many options of different features as input to the 2 models
- Applying RandomForest and XGBoost regressors using GridSearchCV
- Defining a RMSLE function and scorer
- Performing Cross Validation