Video review

prof-rossetti · Sep 24, 2024 · d601e30 · d601e30
1 parent 027d393
commit d601e30
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 12 deletions.
diff --git a/docs/notes/predictive-modeling/autoregressive-models/autocorrelation.qmd b/docs/notes/predictive-modeling/autoregressive-models/autocorrelation.qmd
@@ -15,7 +15,7 @@ Similar to correlation, autocorrelation will range in values from -1 to 1. A pos
 
   + **Strong Negative Autocorrelation**: A strong negative autocorrelation (close to -1) suggests an oscillatory pattern, where high values tend to be followed by low values and vice versa.
 
-  + **Weak Autocorrelation**: If the ACF value is close to zero for a particular lag, it suggests that the time series does not exhibit a strong linear relationship with its past values at that lag. This can indicate that the observations at that lag are not predictive of future values.
+  + **Weak Autocorrelation**: If the ACF value is close to zero for a particular lag, it suggests that the time series does not exhibit a strong relationship with its past values at that lag. This can indicate that the observations at that lag are not predictive of future values.
 
 
 In addition to interpreting the autocorrelation values themselves, we can examine the autocorrelation plot to identify patterns:
@@ -278,8 +278,8 @@ px.line(acf_df, y=["NYY", "BOS", "BAL", "TOR"], markers="O", height=450,
 ```
 
 
-For each team, to what degree is that team's performance in a given year correlated with its performance from the year before?
+We see at lagging period zero, each team's current performance is perfectly correlated with itself. But at lagging period one, the autocorrelation for each team starts to drop off to around 60%. This means for each team, their performance in a given year will be around 60% correlated with the previous year.
 
-How about two, or three, or four years before?
+The autocorrelation for each team continues to drop off at different rates over additional lagging periods. Examining the final autocorrelation value helps us understand, given a team's current performance, how consistent it was over the previous ten years.
 
 Which team is the most consistent in their performance over a ten year period?
diff --git a/docs/notes/predictive-modeling/regression/linear.qmd b/docs/notes/predictive-modeling/regression/linear.qmd
@@ -89,7 +89,7 @@ When using `sklearn`, we must construct the features as a two-dimensional array
 
 ### Train Test Split
 
-Splitting the data randomly into test and training sets. We will train the model on the training set, and evaluate the model using the training set. This helps for generalizability, and to prevent overfitting.
+Splitting the data randomly into test and training sets. We will train the model on the training set, and evaluate the model using the test set. This helps for generalizability, and to prevent overfitting.
 
 ```{python}
 from sklearn.model_selection import train_test_split

diff --git a/docs/notes/predictive-modeling/time-series-forecasting/polynomial.qmd b/docs/notes/predictive-modeling/time-series-forecasting/polynomial.qmd
@@ -73,7 +73,12 @@ px.scatter(df, y="gdp", title="US GDP (Quarterly) vs Lowess Trend", height=450,
 
 In this case, a non-linear trend seems to fit better.
 
-Let's perform a linear regression and an exponential features regression more formally, and compare the results.
+
+
+To compare the results of a linear vs non-linear trend, let’s train two different regression models, and compare the results.
+
+
+
 
 ## Linear Regression
 
@@ -131,9 +136,6 @@ Examining the coefficients and line of best fit:
 print("COEF:", model.coef_.tolist())
 print("INTERCEPT:", model.intercept_)
 print("--------------")
-print(f"EQUATION FOR LINE OF BEST FIT:")
-print(f"y = ({round(model.coef_[0], 3)} billion * years) + {round(model.intercept_, 3)}")
-
 print(f"EQUATION FOR LINE OF BEST FIT:")
 print("y =", f"{model.coef_[0].round(3)}(x)",
         "+", model.intercept_.round(3),

diff --git a/docs/notes/predictive-modeling/time-series-forecasting/seasonality.qmd b/docs/notes/predictive-modeling/time-series-forecasting/seasonality.qmd
@@ -1,5 +1,7 @@
 # Regression for Seasonality Analysis
 
+We've explored using a regression for time series forecasting, but what if there are seasonal or cyclical patterns in the data?
+
 Let's explore an example of how to use regression to identify cyclical patterns and perform seasonality analysis with time series data.
 
 
@@ -17,7 +19,7 @@ set_option('display.max_rows', 6)
 ## Data Loading
 
 
-As an example time series dataset, let's consider this dataset of U.S. unemployment rates over time, from the Federal Reserve Economic Data (FRED).
+For a time series dataset that exemplifies cyclical patterns, let's consider this dataset of U.S. employment over time, from the Federal Reserve Economic Data (FRED).
 
 Fetching the data, going back as far as possible:
 
@@ -200,7 +202,6 @@ Training a linear regression model on the training data:
 ```{python}
 import statsmodels.api as sm
 
-#model = sm.OLS(y_train, x_train, missing="drop")
 model = sm.OLS(y, x, missing="drop")
 print(type(model))
 
@@ -287,6 +288,8 @@ df["quarter"] = df.index.quarter
 df["month"] = df.index.month
 ```
 
+Here we are grouping the data by quarter and calculating the average residual. This shows us for each quarter, on average, whether predictions are above or below trend:
+
 ```{python}
 df.groupby("quarter")["residual"].mean()
 ```
@@ -307,6 +310,7 @@ set_option('display.max_rows', 6)
 
 #### Seasonality via Regression on Periodic Residuals
 
+Let's perform a regression using months as the features and the trend residuals as the target. This can help us understand the degree to which employment will be over or under trend for a given month.
 
 ```{python}
 # https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
@@ -322,7 +326,6 @@ x_monthly
 ```
 
 
-Can we predict the residual (i.e. degree to which employment will be over or under trend), based on which month it is?
 
 ```{python}
 y_monthly = df["residual"]
@@ -338,7 +341,7 @@ print(results_monthly.summary())
 
 The coefficients tell us how each month contributes towards the regression residuals, in other words, for each month, to what degree does the model predict we will be above or below trend?
 
-***Monthly Predictions of Residuals**
+**Monthly Predictions of Residuals**
 
 ```{python}
 df["prediction_monthly"] = results_monthly.fittedvalues