Skip to content

Commit

Permalink
Video review
Browse files Browse the repository at this point in the history
  • Loading branch information
s2t2 committed Sep 24, 2024
1 parent 027d393 commit d601e30
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Similar to correlation, autocorrelation will range in values from -1 to 1. A pos

+ **Strong Negative Autocorrelation**: A strong negative autocorrelation (close to -1) suggests an oscillatory pattern, where high values tend to be followed by low values and vice versa.

+ **Weak Autocorrelation**: If the ACF value is close to zero for a particular lag, it suggests that the time series does not exhibit a strong linear relationship with its past values at that lag. This can indicate that the observations at that lag are not predictive of future values.
+ **Weak Autocorrelation**: If the ACF value is close to zero for a particular lag, it suggests that the time series does not exhibit a strong relationship with its past values at that lag. This can indicate that the observations at that lag are not predictive of future values.


In addition to interpreting the autocorrelation values themselves, we can examine the autocorrelation plot to identify patterns:
Expand Down Expand Up @@ -278,8 +278,8 @@ px.line(acf_df, y=["NYY", "BOS", "BAL", "TOR"], markers="O", height=450,
```


For each team, to what degree is that team's performance in a given year correlated with its performance from the year before?
We see at lagging period zero, each team's current performance is perfectly correlated with itself. But at lagging period one, the autocorrelation for each team starts to drop off to around 60%. This means for each team, their performance in a given year will be around 60% correlated with the previous year.

How about two, or three, or four years before?
The autocorrelation for each team continues to drop off at different rates over additional lagging periods. Examining the final autocorrelation value helps us understand, given a team's current performance, how consistent it was over the previous ten years.

Which team is the most consistent in their performance over a ten year period?
2 changes: 1 addition & 1 deletion docs/notes/predictive-modeling/regression/linear.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ When using `sklearn`, we must construct the features as a two-dimensional array

### Train Test Split

Splitting the data randomly into test and training sets. We will train the model on the training set, and evaluate the model using the training set. This helps for generalizability, and to prevent overfitting.
Splitting the data randomly into test and training sets. We will train the model on the training set, and evaluate the model using the test set. This helps for generalizability, and to prevent overfitting.

```{python}
from sklearn.model_selection import train_test_split
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,12 @@ px.scatter(df, y="gdp", title="US GDP (Quarterly) vs Lowess Trend", height=450,

In this case, a non-linear trend seems to fit better.

Let's perform a linear regression and an exponential features regression more formally, and compare the results.


To compare the results of a linear vs non-linear trend, let’s train two different regression models, and compare the results.




## Linear Regression

Expand Down Expand Up @@ -131,9 +136,6 @@ Examining the coefficients and line of best fit:
print("COEF:", model.coef_.tolist())
print("INTERCEPT:", model.intercept_)
print("--------------")
print(f"EQUATION FOR LINE OF BEST FIT:")
print(f"y = ({round(model.coef_[0], 3)} billion * years) + {round(model.intercept_, 3)}")
print(f"EQUATION FOR LINE OF BEST FIT:")
print("y =", f"{model.coef_[0].round(3)}(x)",
"+", model.intercept_.round(3),
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Regression for Seasonality Analysis

We've explored using a regression for time series forecasting, but what if there are seasonal or cyclical patterns in the data?

Let's explore an example of how to use regression to identify cyclical patterns and perform seasonality analysis with time series data.


Expand All @@ -17,7 +19,7 @@ set_option('display.max_rows', 6)
## Data Loading


As an example time series dataset, let's consider this dataset of U.S. unemployment rates over time, from the Federal Reserve Economic Data (FRED).
For a time series dataset that exemplifies cyclical patterns, let's consider this dataset of U.S. employment over time, from the Federal Reserve Economic Data (FRED).

Fetching the data, going back as far as possible:

Expand Down Expand Up @@ -200,7 +202,6 @@ Training a linear regression model on the training data:
```{python}
import statsmodels.api as sm
#model = sm.OLS(y_train, x_train, missing="drop")
model = sm.OLS(y, x, missing="drop")
print(type(model))
Expand Down Expand Up @@ -287,6 +288,8 @@ df["quarter"] = df.index.quarter
df["month"] = df.index.month
```

Here we are grouping the data by quarter and calculating the average residual. This shows us for each quarter, on average, whether predictions are above or below trend:

```{python}
df.groupby("quarter")["residual"].mean()
```
Expand All @@ -307,6 +310,7 @@ set_option('display.max_rows', 6)

#### Seasonality via Regression on Periodic Residuals

Let's perform a regression using months as the features and the trend residuals as the target. This can help us understand the degree to which employment will be over or under trend for a given month.

```{python}
# https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
Expand All @@ -322,7 +326,6 @@ x_monthly
```


Can we predict the residual (i.e. degree to which employment will be over or under trend), based on which month it is?

```{python}
y_monthly = df["residual"]
Expand All @@ -338,7 +341,7 @@ print(results_monthly.summary())

The coefficients tell us how each month contributes towards the regression residuals, in other words, for each month, to what degree does the model predict we will be above or below trend?

***Monthly Predictions of Residuals**
**Monthly Predictions of Residuals**

```{python}
df["prediction_monthly"] = results_monthly.fittedvalues
Expand Down

0 comments on commit d601e30

Please sign in to comment.