Chapter-09-Notes.Rmd

---
title: "Chapter 9: ARIMA Models"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(fpp3)
```


__Basic Concept__: ARIMA models aim to describe the autocorrelations in the data


## 9.1 Stationarity and Differencing

__Stationarity__: a time series is considered stationary when its statistical
properties do not depend on the time at which it is observed. So, any timeseries
with seasonality or trend is not stationary. White noise is stationary.

__Differencing__: computing the differences between consecutive observations. Helps
stabilize the mean of the series. An ACF plot is useful for identifying a non-stationary
time series, but ndiffs() and nsdiffs() (for seasonal differencing) works too.

__Second Order Differencing__: if a series is not stationary after 1 difference,
it may need a second difference.

__Seasonal Differencing__: difference between an obsermation and the previous 
observation of the same season. Also termed "lag _m_ differences" where m is the 
seasonal period.

__Example__
```{r, fig.height=9, fig.width=8}
PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost)/1e6) %>%
  transmute(
    `Sales ($million)` = Cost,
    `Log sales` = log(Cost),
    `Annual change in log sales` = difference(log(Cost), 12),
    `Doubly differenced log sales` =
                     difference(difference(log(Cost), 12), 1)
  ) %>%
  pivot_longer(-Month, names_to="Type", values_to="Sales") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Sales ($million)",
      "Log sales",
      "Annual change in log sales",
      "Doubly differenced log sales"))
  ) %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Corticosteroid drug sales", y = NULL)
```

  1. Selecting which differences to apply is subjective
  2. If both seasonal and non-seasonal differencing is suggested, if the seasonal
  strength is high, perform seasonal first.
  3. Applying more differences than needed will introduce false dynamics. Perform
  as few as necessary.

### Unit Root Tests

The objective way to determine if differencing is required. They can be performed
with `unitroot_kpss` or `unitroot_ndiffs` or `unitroot_nsdiffs` in the `features()` 
function. To perform all of them use `features(Your_Data_Here, feature_set(pkgs = "feasts"))`

***


## 9.3 Autoregressive Models

Forecasting the variable of interest using a linear combination of past values 
of the variable.

__autoregression__: a regression of the variable against itself


AR models are restricted to stationary data

***


## 9.4 Moving Average Models

Instead of using previous values of the variable in a regression, a MA model
uses past forecast errors in a regression-like model.

Each observation of the data can be thought of as the weighted moving average of
the past few forecast errors.

Not to be confused with moving average smoothing from previous chapters. MA models
forecast future values while MA smoothing is for estimating trends of past values.

***


## 9.5 Non-Seasonal ARIMA Models

ARIMA combines autoregression, differencing, and moving average models. It stands 
for AutoRegressive Integrated Moving Average. This combination gives the ability to 
model the lagged values and lagged errors as well as make the series stationary.

ARIMA(p,d,q)

  * p: order of the autoregression
  * d: degree of differencing
  * q: order of moving average
  
`ARIMA()` from the `fable` package can select these parameters for you.

__Example__
```{r}
global_economy %>%
  filter(Code == "EGY") %>%
  autoplot(Exports) +
  labs(y = "% of GDP", title = "Egyptian Exports")
```
```{r}
fit <- global_economy %>%
  filter(Code == "EGY") %>%
  model(ARIMA(Exports))
report(fit)

```

```{r}
fit %>% forecast(h=10) %>%
  autoplot(global_economy) +
  labs(y = "% of GDP", title = "Egyptian Exports")
```

Above we see an ARIMA model of (2,0,1) captured the cyclic pattern over decades.


### ACF and PACF plots

In order to manually determine p,d,q starting points, employ ACF plots (for q) and 
PACF plots (for p), and then determining based on significant lags (above the blue line).

__ACF__ measures the relationship between $y_t$ and $y_{t-k}$ (lagged values)
__PACF__ measures the relationship between $y_t$ and $y_{t-k}$ after remove the 
effects of the lags in between. 

__Example__

```{r}
global_economy %>%
  filter(Code == "EGY") %>%
  gg_tsdisplay(Exports, plot_type = 'partial')
```

If you're going by significant lags, a (4,0,9) model may look like the 
recommendation here (last significant lag @ 4 w/ PACF and 9 with ACF), but keep 
this in mind:

ARIMA(p,d,0) may be warranted if:

  * ACF is exponentially decaying or sinusoidal
  * significant spike at lag _p_ in the PACF, but none beyond

ARIMA(0,d,q) may be warranted if:

  * PACF is exponentially decaying or sinusoidal
  * significant spike at lag _q_ in the ACF, but none beyond

Above we see a decaying sinusoidal in the ACF and the PACF shows the last
sig spike at 4. Starting with ARIMA(4,0,0) would be suggested. To select your
own ARIMA values for `ARIMA` to search within use:

`ARIMA(y ~ pdq(p=1:4, d=1, q=0:2))`

The (4,0,0) model does only slightly worse than the Auto ARIMA function  
(with an AICc value of 294.70 compared to 294.29).

***


## 9.6 Estimation and Order Selection

__MLE__: Maximum Likelihood Estimation finds the values of the parameters with 
maximize the probability of obtaining the original series.

ARIMA models are much more complicated to estimate than regression models, and
estimation can vary based on the algorithm used.

`fable` will maximize the log likelihood of the data; the log of the probability of the 
observed data coming from the model.


__Information Criteria__
AIC, AICc, and BIC are also used (AICc preferred by the textbook) to determine
best orders of p and q, but due to differencing changing the data on which log
likelihood is computed, Information Criteria cannot be compared between models
with different d values.


***


## 9.7 ARIMA Modeling in `fable`

#### How Does `ARIMA()` Work?

  1. Differences determined by KPSS tests
  2. Values of p and q are chosen by minimizing the AICc after differencing the data.
  Algorithm fits four models and uses a step-wise approach minimizing AICc each step.

***

## 9.8 Forecasting

#### Prediction Intervals

It is assumed that residuals are uncorrelated and normally distributed for ARIMA
forecasting. Forecast is unreliable if either is untrue. Always plot ACF and 
histogram of residuals.

If residuals uncorrelated but not normally distributed, obtain bootstrapped intervals
by using `bootstrap=TRUE` in `forecast()`


***

## 9.9 Seasonal ARIMA Models

For seasonal ARIMA models additional seasonal terms $(P,D,Q)m$ are included, where
$m$ is the seasonal period. 

#### Seasonal ARIMA Modeling Procedure
__Example__
Data:
```{r}
leisure <- us_employment %>%
  filter(Title == "Leisure and Hospitality",
         year(Month) > 2000) %>%
  mutate(Employed = Employed/1000) %>%
  select(Month, Employed)
autoplot(leisure, Employed) +
  labs(title = "US employment: leisure and hospitality",
       y="Number of people (millions)")
```

Non-stationary data with clear seasonality and trend.

1. Take seasonal difference. It is a yearly trend, so we take `difference(12)` below
```{r}
leisure %>%
  gg_tsdisplay(difference(Employed, 12),
               plot_type='partial', lag=36) +
  labs(title="Seasonally differenced", y="")
```

This is also clearly non-stationary with non-normal residuals and very significant
lags. 

2. Further differencing required.

```{r}
leisure %>%
  gg_tsdisplay(difference(Employed, 12) %>% difference(),
               plot_type='partial', lag=36) +
  labs(title = "Double differenced", y="")
```

3. Now with residuals centering around zero, we look to the ACF and PACF to 
define our terms. 
 * ACF: Significant spike at lag 2 suggests non-seasonal MA(2)
 * ACF: Significant spike at lag 12 suggests a seasonal MA(1) component
 
 This leads to a ARIMA(0,1,2)(0,1,1)12 model
 
 * PACF: Significant spike at lag 2 suggests a non-seasonal AR(2)
 * Still use the ACF to determine the seasonal component.
 
 This leads to a ARIMA(2,1,0)(0,1,1)12
 
 Also including the auto ARIMA function setting `stepwise=FALSE` and `approximation=FALSE`
 to make R work extra to find the best model.

```{r}
fit <- leisure %>%
  model(
    arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
    arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
    auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
  )
```

```{r}
glance(fit) %>% arrange(AICc) %>% select(.model:BIC)

```
We see here the auto ARIMA performs best via 

Finding the order and residuals of the auto ARIMA:
```{r}
fit %>% 
  select(auto) %>%  
  report()

fit %>%
  select(auto) %>%
  gg_tsresiduals(lag=36)
```

One small but significant spike (lag 11) is still consistent with white noise.

4. Verify residuals are white noise

```{r}
augment(fit) %>% features(.innov, ljung_box, lag=24, dof=4)

```

The model residuals do not differentiate from white noise.

5. Forecast
```{r}
forecast(fit, h=36) %>%
  filter(.model=='auto') %>%
  autoplot(leisure) +
  labs(title = "US employment: leisure and hospitality",
       y="Number of people (millions)")
```

***

## 9.10 ARIMA vs ETS

To compare ARIMA and ETS models use time series cross-validation or train/test split.

Using CV:

__Example__
```{r}
aus_economy <- global_economy %>%
  filter(Code == "AUS") %>%
  mutate(Population = Population/1e6)

aus_economy %>%
  slice(-n()) %>%
  stretch_tsibble(.init = 10) %>%
  model(
    ETS(Population),
    ARIMA(Population)
  ) %>%
  forecast(h = 1) %>%
  accuracy(aus_economy) %>%
  select(.model, RMSE:MAPE)
```


```{r}
aus_economy %>%
  model(ETS(Population)) %>%
  forecast(h = "5 years") %>%
  autoplot(aus_economy %>% filter(Year >= 2000)) +
  labs(title = "Australian population",
       y = "People (millions)")
```