Skip to content

Final Project for MGSC 310: Statistical Models for Business Analytics (Introduction to Machine Learning)

Notifications You must be signed in to change notification settings

itserinlee/MGSC310_Goodreads

Repository files navigation

Predictive Modeling on Goodreads Dataset

Statistical Models for Business Analytics (Introduction to Machine Learning) - MGSC 310, Fall 2020

Corinne Smith, Erin Lee, Jon Le, Adam Gonzalez, Debbie Lu

Datasets

Note: Both Kaggle data sets were originally scraped from the Goodreads API.

Programming Language

R

Data At-A-Glance

  • variables used in project:
    • outcome: “average_rating”
    • predictors (9 total):
      • "num_pages"
      • “book_ratings_count”
      • “text_reviews_count”
      • “title_sentiment_avg”
      • “authorworkcount”
      • “author_fans”
      • “author_ratings_count
      • “author_review_count”
      • “gender”
Note: removed 11 variables from original data set

Instructions

• Recommended: Run the program using the R Markdown format: "MGSC310FinalProject.Rmd" •
  • Download the "datasets" folder.
  • Download the "MGSC310FinalProject.Rmd" file.
  • Given that the "datasets" folder & "MGSC310FinalProject.Rmd" file are in the same directory, open & run the R Markdown file in RStudio.

Models

1. Linear Regression

Table of Coefficients

coef_table

Plot of Coefficients

lr_coef

2. Elastic Net

Plot of the Error Versus the Penalty (Regression with Regularization)

plot_enet

Plot of the Path of the Coefficients

coef_path

3. Bootstrap Aggregated (Bagged) Decision Tree

Examining an Individual Model from Bagging

bagged_tree

4. Random Forest

Plot of Error Vesus the Number of Trees

plot_rf_fit

Variable Importance Plot

rf_var_imp

Plotting the Minimum Depth Distribution

rf_explainer_pckg

Model Evaluation: Comparison of Metrics

model_metrics

About

Final Project for MGSC 310: Statistical Models for Business Analytics (Introduction to Machine Learning)

Resources

Stars

Watchers

Forks