You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: epipredict.qmd
+16-16
Original file line number
Diff line number
Diff line change
@@ -10,21 +10,22 @@ At a high level, our goal with `{epipredict}` is to make running simple machine
10
10
Serving both populations is the main motivation for our efforts, but at the same time, we have tried hard to make it useful.
11
11
12
12
13
-
## Baseline models
13
+
## Canned forecasters
14
14
15
-
We provide a set of basic, easy-to-use forecasters that work out of the box.
16
-
You should be able to do a reasonably limited amount of customization on them. Any serious customization happens with the framework discussed below.
15
+
We provide a set of basic, easy-to-use forecasters that work out of the box:
17
16
18
-
For the basic forecasters, we provide:
19
-
20
17
* Flatline (basic) forecaster
21
18
* Autoregressive forecaster
22
19
* Autoregressive classifier
23
20
* Smooth autoregressive(AR) forecaster
24
21
25
-
All the forcasters we provide are built on our framework. So we will use these basic models to illustrate its flexibility.
22
+
These forecasters encapsulate a series of operations (including data preprocessing, model fitting and etc.) all in instant one-liners.
23
+
They are basically alternatives to each other. The main difference is the use of different models. Three forecasters use different regression models and the other one use a classification model.
24
+
25
+
The operations within canned forecasters all follow our uniform **framework**.
26
+
Although these one-liners allow a reasonably limited amount of customization, to uncover any serious customization you need more knowledge on our framework explained in @sec-framework.
26
27
27
-
## Forecasting framework
28
+
## Forecasting framework {#sec-framework}
28
29
29
30
At its core, `{epipredict}` is a **framework** for creating custom forecasters.
30
31
By that we mean that we view the process of creating custom forecasters as
@@ -47,8 +48,7 @@ Therefore, if you want something from this -verse, it should "just work" (we hop
47
48
The reason for the overlap is that `{workflows}`_already implements_ the first
48
49
three steps. And it does this very well. However, it is missing the
49
50
postprocessing stage and currently has no plans for such an implementation.
50
-
And this feature is important. The baseline forecaster we provide _requires_
51
-
postprocessing. Anything more complicated (which is nearly everything)
51
+
And this feature is important. All forecasters need post-processing. Anything more complicated (which is nearly everything)
52
52
needs this as well.
53
53
54
54
The second omission from `{tidymodels}` is support for panel data. Besides
@@ -64,14 +64,14 @@ into an `epi_df` as described in @sec-additional-keys.
64
64
65
65
## Why doesn't this package already exist?
66
66
67
-
- Parts of it actually DO exist. There's a universe called `tidymodels`. It
67
+
- Parts of it actually DO exist. There's a universe called `{tidymodels}`. It
68
68
handles pre-processing, training, and prediction, bound together, through a
69
-
package called workflows. We built `epipredict` on top of that setup. In this
69
+
package called workflows. We built `{epipredict}` on top of that setup. In this
70
70
way, you CAN use almost everything they provide.
71
71
- However, workflows doesn't do post-processing to the extent envisioned here.
72
-
And nothing in `tidymodels` handles panel data.
72
+
And nothing in `{tidymodels}` handles panel data.
73
73
- The tidy-team doesn't have plans to do either of these things. (We checked).
74
-
- There are two packages that do time series built on `tidymodels`, but it's
74
+
- There are two packages that do time series built on `{tidymodels}`, but it's
75
75
"basic" time series: 1-step AR models, exponential smoothing, STL decomposition,
76
76
etc.[^1]
77
77
@@ -101,7 +101,7 @@ out <- arx_forecaster(
101
101
)
102
102
```
103
103
104
-
This call produces a warning, which we'll ignore for now. But essentially, it's telling us that our data comes from May 2022 but we're trying to do a forecast for January 2022. The result is likely not an accurate measure of real-time forecast performance, because the data have been revised over time.
104
+
This call produces a warning, which we'll ignore for now. But essentially, it's telling us that our data comes from May 2022 but we're trying to do a forecast for January 2022. The result is likely not an accurate measure of real-time forecast performance, because the data has been revised over time.
105
105
106
106
```{r}
107
107
out
@@ -115,7 +115,7 @@ of what the predictions are for. It contains three main components:
115
115
```{r}
116
116
str(out$metadata)
117
117
```
118
-
2. The predictions in a tibble. The columns give the predictions for each location along with additional columns. By default, these are a 90% predictive interval, the `forecast_date` (the date on which the forecast was putatively made) and the `target_date` (the date for which the forecast is being made).
118
+
2. The predictions in a tibble. The columns give the predictions for each location along with additional columns. By default, these are a 90% prediction interval, the `forecast_date` (the date on which the forecast was putatively made) and the `target_date` (the date for which the forecast is being made).
119
119
```{r}
120
120
out$predictions
121
121
```
@@ -159,7 +159,7 @@ likely increase the variance of the model, and therefore, may lead to less
159
159
accurate forecasts for the variable of interest.
160
160
161
161
162
-
Another property of the basic model is the predictive interval. We describe this in more detail in a coming chapter, but it is easy to request multiple quantiles.
162
+
Another property of the basic model is the prediction interval. We describe this in more detail in a coming chapter, but it is easy to request multiple quantiles.
Copy file name to clipboardExpand all lines: flatline-forecaster.qmd
+5-5
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Introducing the flatline forecaster
2
2
3
-
The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The predictive intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
3
+
The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The prediction intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
The post-processing operations in the order that were performed were to create the predictions and the predictive intervals, add the forecast and target dates and bound the predictions at zero.
120
+
The post-processing operations in the order that were performed were to create the predictions and the prediction intervals, add the forecast and target dates and bound the predictions at zero.
121
121
122
122
We can also easily examine the predictions themselves.
123
123
124
124
```{r}
125
125
five_days_ahead$predictions
126
126
```
127
127
128
-
The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% predictive interval is available for every state (`geo_value`).
128
+
The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% prediction interval is available for every state (`geo_value`).
129
129
130
130
The figure below displays the prediction and prediction interval for three sample states: Arizona, New York, and Florida.
of feature engineering and [`{parsnip}`](https://parsnip.tidymodels.org/)
15
16
for a unified interface to a range of models.
16
17
17
-
`epipredict` has additional customized feature engineering and preprocessing
18
+
`{epipredict}` has additional customized feature engineering and preprocessing
18
19
steps that specifically work with panel data in this context, for example,
19
20
`step_epi_lag()`, `step_population_scaling()`,
20
21
`step_epi_naomit()`. They can be used along with most
21
22
steps from the `{recipes}` package for more feature engineering.
22
23
23
-
In this vignette, we will illustrate some examples of how to use `epipredict`
24
-
with `recipes` and `parsnip` for different purposes of
24
+
In this vignette, we will illustrate some examples of how to use `{epipredict}`
25
+
with `{recipes}` and `{parsnip}` for different purposes of
25
26
epidemiological forecasting.
26
27
We will focus on basic autoregressive models, in which COVID cases and
27
28
deaths in the near future are predicted using a linear combination of cases
@@ -52,7 +53,7 @@ deploying control measures.
52
53
One of the outcomes that the CDC forecasts is [death counts from COVID-19](https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html).
53
54
Although there are many state-of-the-art models, we choose to use Poisson
54
55
regression, the textbook example for modeling count data, as an illustration
55
-
for using the `epipredict` package with other existing `{tidymodels}` packages.
56
+
for using the `{epipredict}` package with other existing `{tidymodels}` packages.
56
57
57
58
The (folded) code below gives the necessary commands to download this data
58
59
from the Delphi Epidata API, but it is also built into the
@@ -112,13 +113,13 @@ $s_{\text{state}}$ are dummy variables for each state and take values of either
112
113
0 or 1.
113
114
114
115
Preprocessing steps will be performed to prepare the
115
-
data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `recipes` framework.
116
+
data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `{recipes}` framework.
116
117
117
118
---
118
119
119
-
#### Aside on `recipes` {.unnumbered}
120
+
#### Aside on `{recipes}` {.unnumbered}
120
121
121
-
`recipes` can assign one or more roles to each column in the data. The roles
122
+
`{recipes}` can assign one or more roles to each column in the data. The roles
122
123
are not restricted to a predefined set; they can be anything.
123
124
For most conventional situations, they are typically “predictor” and/or
124
125
"outcome". Additional roles enable targeted `step_*()` operations on specific
@@ -132,7 +133,7 @@ that are unique to the `epipredict` package. Since we work with `epi_df`
132
133
objects, all datasets should have `geo_value` and `time_value` passed through
133
134
automatically with these two roles assigned to the appropriate columns in the data.
134
135
135
-
The `recipes` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html)
136
+
The `{recipes}` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html)
136
137
in bulk. There are a few handy functions that can be used together to help us
137
138
manipulate variable roles easily.
138
139
@@ -170,7 +171,7 @@ r <- epi_recipe(counts_subset) %>%
170
171
step_epi_naomit()
171
172
```
172
173
173
-
After specifying the preprocessing steps, we will use the `parsnip` package for
174
+
After specifying the preprocessing steps, we will use the `{parsnip}` package for
174
175
modeling and producing the prediction for death count, 7 days after the
175
176
latest available date in the dataset.
176
177
@@ -206,8 +207,8 @@ However, the Delphi Group preferred to train on rate data instead, because it
206
207
puts different locations on a similar scale (eliminating the need for location-specific intercepts).
207
208
We can use a linear regression to predict the death rates and use state
208
209
population data to scale the rates to counts.[^pois] We will do so using
209
-
`layer_population_scaling()` from the `epipredict` package. (We could also use
210
-
`step_population_scaling()` from the `epipredict` package to prepare rate data
210
+
`layer_population_scaling()` from the `{epipredict}` package. (We could also use
211
+
`step_population_scaling()` from the `{epipredict}` package to prepare rate data
211
212
from count data in the preprocessing recipe.)
212
213
213
214
[^pois]: We could continue with the Poisson model, but we'll switch to the Gaussian likelihood just for simplicity.
@@ -295,9 +296,9 @@ jhu <- filter(
295
296
)
296
297
```
297
298
298
-
Preprocessing steps will again rely on functions from the `epipredict` package
299
-
as well as the `recipes` package.
300
-
There are also many functions in the `recipes` package that allow for
299
+
Preprocessing steps will again rely on functions from the `{epipredict}` package
300
+
as well as the `{recipes}` package.
301
+
There are also many functions in the `{recipes}` package that allow for
0 commit comments