100 Days of Code

Projects

ML Milestone Project 1 - Heart Disease Predictions
ML Milestone Project 2 - Bluebook for Bulldozers (Bulldozer Price Predictions)

Daily Log

Day 1: Going over the basics of using CSS to personalize a webpage
Day 2: Learned about the how to combine selectors in CSS in order to increase specificity or increase range and thus limit repetition
Day 3: Played a CSS selector game to solidify knowledge from yesterday (https://css-diner.netlify.app/)
Day 4: Played with some machine learning models to watch how the judgement process works at a superficial level, and practiced CSS rules for changing qualities of text and images.
Day 5: Practiced using the box model to manipulate the size of divs. Also learned about the different sizing methods available in css (px, em, rem, VH, VW)
Day 6: Learned about the critical render path and online resources for minifying CSS code for faster transfers. Also implemented flexbox styles for images, and solidified understanding through an interactive project.
Day 7: Added transform and transition elements to the previously created flexbox testing webpage.
Day 8: Introduction to bootstrap, learned about how to link bootstrap with the CDN and how it can streamline element addition to websites.
Day 9: Covered the introduction to machine learning and data science bootcamp, no coding exercises have been completed today.
Day 10: Completed initial steps in the Startup Launch Page exercise, adding the basic elements and styles.
Day 11: Completed layout organization for the Startup Launch Page exercise. Also learned to use github to upload a website publicly.
Day 12: Downloaded conda and started a practice environment and a jupyter notebook.
Day 13: Completed a Jupyter Notebook Tutorial to learn about the system of cells and the various keyboard shortcuts available.
Day 14: Going through the Pandas tutorial, learning about commands that allow for basic visualizations of data.
Day 15: Completed Pandas tutorial exercises. Began numpy tutorial, learned about the mechanism behind dot product and other mathematical functions available for matrices, and their differences. Also practiced using comparisons and reshaping and transposing matrices for calculation purposes.
Day 16: Completed Numpy exercises.
Day 17: Began matplotlib tutorial, learned how to start simple plots and began customizing the type of plot and the labels that help make plots useful at a glance.
Day 18: Continued matplotlib tutorial, practicing using pandas dataframes to create plots.
Day 19: Completed matplotlib tutorial, customizing figure colors and styles, and completed matplotlib exercise set. <br? Day 20: Began Scikit=Learn tutorial, going over a basic workflow and exploring a cheatsheet style notebook.
Day 21: Continued Scikit-learn tutorial, reviewed basic statistical methods like data splits needed for initating a model.
Day 22: Continued scikit-learn tutorial.
Day 23: Followed an example of a full sciekit-learn use case, creating, training, and analyzing a random forest model.
Day 24: Learned about how to update and remove certain packages in conda to deal with certain version based warnings and started going more in depth into transforming data before using it in a model.
Day 25: Practiced using OneHotEncoder to transform categorical data into a binary matrix that is usable for models.
Day 26: Learned to use Pandas .filna() and Scikit Learn imputers in order to transform and fill missing data.
Day 27: Looked through the Scikit Learn model map and pracitcing using different regression models and comparing the results using California housing data.
Day 28: Testing models for regression predictions.
Day 29: Testing Scikit learn models for classification predictions.
Day 30: Learning more about how to fit Scikit learn models with data.
Day 31: Learning more about using models to create predictions and interpreting results.
Day 32: Used predict_proba() to look at classification probabilities and compared to a predict() prediction on the same dataset.
Day 33: Learning about the different ways to evaluate Scikit-Learn models (so far score() and cross validation)
Day 34: Learned about what calculations are used behind the scenes when using score() on regression versus classification models.
Day 35: Compared the results of accuracy (cross validation) and an ROC curve on classification data.
Day 36: Learned about the use of confusion matrices and classification reports as methods to analuze classification models.
Day 37: Practiced using the r2_score function to analyze regression models.
Day 38: Learned about using mean absolute error and mean squared error as other methods to evaluate the results of a regression model.
Day 39: Used cross_val_score and looked through the different options for the "scoring" parameter.
Day 40: Practicing using the direct metrics functions from sklearn to collect evaluation data of classification and regression data.
Day 41: Overview of huperparameters.
Day 42: Used GridSearchCV and RandomSearchCV to test different huperparameter combinations without manually adjusting the values.
Day 43: Learned how to use the pickle and Joblib modules to export and load machine learning models. Also completed an overview of the learning up to this point, will begin working on the milestone projects soon.
Day 44: Set up a new environment for the first milestone project.
Day 45: Prepared the notebook environment for the project.
Day 46: Began data exploration for heart disease data.
Day 47: Completed exploratory data analysis with a correlation matrix across the entire dataframe.
Day 48: Completed initial model evaluation.
Day 49: Started hyperparameter tuning process with K nearest neighbors model.
Day 50: Starting to use RandomizedSearchCV(), created parameter grids for testing Logistic regression and random forest models.
Day 51: Completed Randomized search tuning, improvements were minimal to both models.
Day 52: Used Grid search cv to attempt improvements on both models again. Produced a bar graph of cross validated evaluation metrics.
Day 53: Analyzed the feature importance of the 11 features used on the logisitc regression model.
Day 54: Completed the end-to-end heart disease classification project.
Day 55: Went over milestone project 2 overview: Bulldozer Sales Prediction model.
Day 56: Prepared the conda environment and jupyter notebook for milestone project 2.
Day 57: Wrote a header for the second milestone project explaining the features and the goal.
Day 58: Started initial data exploration, reviewing the different factors.
Day 59: Took advantage of the parse_dates parameter to allow pandas to recognize dates as datetimes.
Day 60: Reorganized data by datetime and created a copy of the data to safely experiment with in the future.
Day 61: Split the saledata feature into more specific elements in the hope of enriching the data on when the sale happened during the year.
Day 62: Started initializing a model to support EDA, need to fill the NA values and translate the categorical values before more progress can be made
Day 63: Used the pandas "Category" dtype to transform string type columns into categorical data.
Day 64: Filling in missing numerical data with median values and added a column to mark where missing values were in case the missing value itself is a useful factor.
Day 65: Finished transforming the categorical data into a numerical format using Pandas Categorical type and filled all missing values.
Day 66: Initialized a Random Forest Regressor model and timed how long it took to load all of the data without any trimming.
Day 67: Split data into train and validation sets using the sale year, as the goal of this model is to accurately predict future prices.
Day 68: Created two custom evaluation functions, one to calculate the RMSLE and another to combine the scores for MAE, RMSLE and R^2 for training and validation sets
Day 69: Used the max_samples feature of the RandomForestRegressor to limit the size of the training data to improve experimentation speeds.
Day 70: Used RandomizedSearchCV to help with hyperparameter tuning.
Day 71: Used pre-collected hyperparameters to train a new model and make predicitons on the test data. Preprocessed test data so the model could use it and formatted the data to match the requested format for a kaggle submission.
Day 72: Created a function to visualize the top feature importance scores
Day 73: Followed an overview of the 4 main forms of data along with starting to learn about what a data engineer does and how they work with data scientists.
Day 74: Continued learning about data engineers and common tasks they complete, like maintaining data sources and analysis tools.
Day 75: Went more in depth on the differences between relational and non-relational databases and why the ideal database model is dependent on the intended use.
Day 76: Learned about Hadoop, along with HDFS and MapReduce, and how these tools helped solve how to work with big data.
Day 77: Learned about stream processing and how Apache Spark and Flink both improved on the processing speed of MapReduce.
Day 78: Learned about Kafka, message brokers, and how Kafka works with other tools like Apache Spark and Hadoop all link together.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
ML and Data Science		ML and Data Science
Web Development		Web Development
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

100 Days of Code

Projects

Daily Log

About

Releases

Packages

Languages

SleepyKumiho/100DayChallenge

Folders and files

Latest commit

History

Repository files navigation

100 Days of Code

Projects

Daily Log

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages