Skip to content

Commit

Permalink
Stationarity
Browse files Browse the repository at this point in the history
  • Loading branch information
s2t2 committed Sep 20, 2024
1 parent e27f745 commit 8ff1bdf
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

In time series analysis, autocorrelation helps identify patterns and dependencies in data, particularly when dealing with sequences of observations over time, such as stock prices, temperature data, or sales figures. Autocorrelation analysis is helpful for detecting trends, periodicities, and other temporal patterns in the data, as well as for developing predictive models.

In predictive modeling, especially for time series forecasting, autocorrelation is essential for selecting the number of lagged observations (or lags) to use in autoregressive models. By calculating the autocorrelation for different lag intervals, it is possible to determine how much influence past values have on future ones. This process helps us choose the optimal lag length, which in turn can improve the accuracy of forecasts.


## Interpreting Autocorrelation

Similar to correlation, autocorrelation will range in values from -1 to 1. A positive autocorrelation indicates that a value tends to be similar to preceding values, while a negative autocorrelation suggests that a value is likely to differ from previous observations.
Expand All @@ -14,9 +17,17 @@ Similar to correlation, autocorrelation will range in values from -1 to 1. A pos

+ **Weak Autocorrelation**: If the ACF value is close to zero for a particular lag, it suggests that the time series does not exhibit a strong linear relationship with its past values at that lag. This can indicate that the observations at that lag are not predictive of future values.

## Uses for Predictive Modeling

In predictive modeling, especially for time series forecasting, autocorrelation is essential for selecting the number of lagged observations (or lags) to use in autoregressive models. By calculating the autocorrelation for different lag intervals, it is possible to determine how much influence past values have on future ones. This process helps us choose the optimal lag length, which in turn can improve the accuracy of forecasts.
In addition to interpreting the autocorrelation values themselves, we can examine the autocorrelation plot to identify patterns:

+ Exponential decay in the ACF indicates a stationary autoregressive process (AR model).
+ One or two significant spikes followed by rapid decay suggest a moving average process (MA model).
+ Slow decay or oscillation often suggests non-stationarity, which may require differencing to stabilize the series.






## Calculating Autocorrelation in Python

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,23 @@

A **stationary** time series is one whose statistical properties, such as mean, variance, and autocorrelation, do not change over time. In other words, the data fluctuates around a constant mean and has constant variance.

Stationarity ensures that the underlying process generating the data remains stable over time, which is crucial for building predictive models on time series data.

## Types of Stationarity
![Stationary vs non-stationary time series data, focusing on mean. [Source](https://miro.medium.com/v2/1*-ecA_r11hpyIEUJSwBAtAA.png).](../../../images/stationary-data.png)


1. **Strict Stationarity**: The distribution of the time series does not change over time. This is quite strict, and rarely occurs in real-world data.
2. **Weak Stationarity (or Second-Order Stationarity)**: This is the most common type and only requires the mean, variance, and autocorrelation to be constant over time.
![Stationary vs non-stationary time series data, focusing on variance. [Source](https://miro.medium.com/v2/1*-ecA_r11hpyIEUJSwBAtAA.png).](../../../images/stationary-data-variance.webp)


Stationarity ensures that the underlying process generating the data remains stable over time, which is crucial for building predictive models on time series data.

In the first example, we examine whether the mean is stationary:
## Types of Stationarity

There are different types of stationarity, including:

![Stationary vs non-stationary time series data, focusing on mean [Source](https://miro.medium.com/v2/1*-ecA_r11hpyIEUJSwBAtAA.png).](../../../images/stationary-data.png)
1. **Strict Stationarity**: The distribution of the time series does not change over time. This is quite strict, and rarely occurs in real-world data.
2. **Weak Stationarity (or Second-Order Stationarity)**: This is the most common type and only requires the mean, variance, and autocorrelation to be constant over time.

In the second example, we examine whether the standard deviation is stationary:

![Stationary vs non-stationary time series data, focusing on variance [Source](https://miro.medium.com/v2/1*-ecA_r11hpyIEUJSwBAtAA.png).](../../../images/stationary-data-variance.webp)



Expand All @@ -30,15 +30,15 @@ In time series analysis, stationarity is a key assumption that greatly influence

+ For **Linear Regression** models: while linear regression does not explicitly require stationarity in the data, regression models generally work better with stationary data, particularly if the relationship between the features and the target is assumed to be stable over time.

+ For **ARIMA (Autoregressive Integrated Moving Average)** modesl: ARIMA models require the data to be stationary. If the time series is not stationary, the model's assumptions break down, and it will not perform well. The "Integrated (I)" part specifically deals with non-stationarity by differencing the data (i.e. subtracting the previous observation from the current one) to make it stationary.
+ For **ARIMA (Autoregressive Integrated Moving Average)** models: ARIMA models require the data to be stationary. If the time series is not stationary, the model's assumptions break down, and it will not perform well. The "Integrated (I)" part specifically deals with non-stationarity by differencing the data (i.e. subtracting the previous observation from the current one) to make it stationary.

## Testing for Stationarity

Here are some common ways to test for stationarity in time series data:

1. **Visual Inspection**: Plot the Data: Plotting the time series can often give a good idea of whether the data is stationary. Look for consistent variance, a constant mean, and no obvious trend or seasonality over time.

2. **Augmented Dickey-Fuller (ADF) Test**: The ADF test is a statistical test where the null hypothesis is that the data has a unit root (i.e. is non-stationary). If the p-value is below a certain threshold (e.g. 0.05), we can reject the null hypothesis, indicating that the series is stationary.
2. **Augmented Dickey-Fuller (ADF) Test**: The ADF test is a statistical test where the null hypothesis is that the data has a unit root (i.e. is non-stationary). If the p-value is below a certain threshold (e.g. 0.05), we can reject the null hypothesis, indicating that the series is stationary. In Python, we can use the [`adfuller` function](https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html) from `statsmodels`:

```python
from statsmodels.tsa.stattools import adfuller
Expand All @@ -48,7 +48,7 @@ print(f"ADF Statistic: {result[0]}")
print(f"P-value: {result[1]}")
```

3. **KPSS Test** (Kwiatkowski-Phillips-Schmidt-Shin): the KPSS test is another test for stationarity, but its null hypothesis is the opposite of the ADF test. In KPSS, the null hypothesis is that the series is stationary. A low p-value indicates non-stationarity.
3. **KPSS Test** (Kwiatkowski-Phillips-Schmidt-Shin): the KPSS test is another test for stationarity, but its null hypothesis is the opposite of the ADF test. In KPSS, the null hypothesis is that the series is stationary. A low p-value indicates non-stationarity. In Python, we can use the [`kpss` function](https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.kpss.html) from `statsmodels`:

```python
from statsmodels.tsa.stattools import kpss
Expand Down
7 changes: 7 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@


#
# TIME SERIES
#
# https://machinelearningmastery.com/autoregression-models-time-series-forecasting-python/
# https://machinelearningmastery.com/introduction-to-time-series-forecasting-with-python/
#
# STATIONARY DATA
Expand Down

0 comments on commit 8ff1bdf

Please sign in to comment.