You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: epipredict.qmd
+13-12
Original file line number
Diff line number
Diff line change
@@ -39,39 +39,39 @@ There are four types of components:
39
39
3. Predictor: make predictions, using a fitted model object and processed test data
40
40
4. Postprocessor: manipulate or transform the predictions before returning
41
41
42
-
Users familiar with [`{tidymodels}`](https://www.tidymodels.org) and especially
43
-
the [`{workflows}`](https://workflows.tidymodels.org) package will notice a lot
42
+
Users familiar with `{tidymodels}` and especially
43
+
the `{workflows}` package will notice a lot
44
44
of overlap. This is by design, and is in fact a feature. The truth is that
45
45
`{epipredict}` is a wrapper around much that is contained in these packages.
46
46
Therefore, if you want something from this -verse, it should "just work" (we hope).
47
47
48
-
The reason for the overlap is that `{workflows}`_already implements_ the first
48
+
The reason for the overlap is that `workflows`_already implements_ the first
49
49
three steps. And it does this very well. However, it is missing the
50
50
postprocessing stage and currently has no plans for such an implementation.
51
51
And this feature is important. All forecasters need post-processing. Anything more complicated (which is nearly everything)
52
52
needs this as well.
53
53
54
-
The second omission from `{tidymodels}` is support for panel data. Besides
54
+
The second omission from `tidymodels` is support for panel data. Besides
55
55
epidemiological data, economics, psychology, sociology, and many other areas
56
-
frequently deal with data of this type. So the framework of behind `{epipredict}`
56
+
frequently deal with data of this type. So the framework of behind `epipredict`
57
57
implements this. In principle, this has nothing to do with epidemiology, and
58
58
one could simply use this package as a solution for the missing functionality in
59
-
`{tidymodels}`. Again, this should "just work" (we hope).
59
+
`tidymodels`. Again, this should "just work" (we hope).
60
60
61
61
All of the _panel data_ functionality is implemented through the `epi_df` data type
62
62
described in the previous part. If you have different panel data, just force it
63
63
into an `epi_df` as described in @sec-additional-keys.
64
64
65
65
## Why doesn't this package already exist?
66
66
67
-
- Parts of it actually DO exist. There's a universe called `{tidymodels}`. It
67
+
- Parts of it actually DO exist. There's a universe called `tidymodels`. It
68
68
handles pre-processing, training, and prediction, bound together, through a
69
-
package called workflows. We built `{epipredict}` on top of that setup. In this
69
+
package called workflows. We built `epipredict` on top of that setup. In this
70
70
way, you CAN use almost everything they provide.
71
71
- However, workflows doesn't do post-processing to the extent envisioned here.
72
-
And nothing in `{tidymodels}` handles panel data.
72
+
And nothing in `tidymodels` handles panel data.
73
73
- The tidy-team doesn't have plans to do either of these things. (We checked).
74
-
- There are two packages that do time series built on `{tidymodels}`, but it's
74
+
- There are two packages that do time series built on `tidymodels`, but it's
75
75
"basic" time series: 1-step AR models, exponential smoothing, STL decomposition,
76
76
etc.[^1]
77
77
@@ -94,6 +94,7 @@ in the built-in data frame).
94
94
jhu <- case_death_rate_subset %>%
95
95
filter(time_value >= max(time_value) - 30)
96
96
97
+
library(epipredict)
97
98
out <- arx_forecaster(
98
99
jhu,
99
100
outcome = "death_rate",
@@ -128,7 +129,7 @@ By default, the forecaster predicts the outcome (`death_rate`) 1-week ahead,
128
129
using 3 lags of each predictor (`case_rate` and `death_rate`) at 0 (today),
129
130
1 week back and 2 weeks back. The predictors and outcome can be changed
130
131
directly. The rest of the defaults are encapsulated into a list of arguments.
131
-
This list is produced by `arx_args_list()`.
132
+
This list is produced by `arx_args_list()`.
132
133
133
134
## Simple adjustments
134
135
@@ -197,7 +198,7 @@ arx_args_list(
197
198
198
199
So far, our forecasts have been produced using simple linear regression. But this is not the only way to estimate such a model.
199
200
The `trainer` argument determines the type of model we want.
200
-
This takes a [`{parsnip}`](https://parsnip.tidymodels.org) model. The default is linear regression, but we could instead use a random forest with the `{ranger}` package:
201
+
This takes a `{parsnip}` model. The default is linear regression, but we could instead use a random forest with the `{ranger}` package:
of feature engineering and [`{parsnip}`](https://parsnip.tidymodels.org/)
16
-
for a unified interface to a range of models.
13
+
`{recipes}` for `{dplyr}`-like pipeable sequences of feature engineering and `{parsnip}` for a unified interface to a range of models.
17
14
18
-
`{epipredict}` has additional customized feature engineering and preprocessing
15
+
`epipredict` has additional customized feature engineering and preprocessing
19
16
steps that specifically work with panel data in this context, for example,
20
17
`step_epi_lag()`, `step_population_scaling()`,
21
18
`step_epi_naomit()`. They can be used along with most
22
-
steps from the `{recipes}` package for more feature engineering.
19
+
steps from the `recipes` package for more feature engineering.
23
20
24
-
In this vignette, we will illustrate some examples of how to use `{epipredict}`
25
-
with `{recipes}` and `{parsnip}` for different purposes of
21
+
In this vignette, we will illustrate some examples of how to use `epipredict`
22
+
with `recipes` and `parsnip` for different purposes of
26
23
epidemiological forecasting.
27
24
We will focus on basic autoregressive models, in which COVID cases and
28
25
deaths in the near future are predicted using a linear combination of cases
@@ -40,6 +37,7 @@ library(epipredict)
40
37
library(recipes)
41
38
library(workflows)
42
39
library(poissonreg)
40
+
library(epidatasets)
43
41
```
44
42
45
43
## Poisson Regression
@@ -53,12 +51,10 @@ deploying control measures.
53
51
One of the outcomes that the CDC forecasts is [death counts from COVID-19](https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html).
54
52
Although there are many state-of-the-art models, we choose to use Poisson
55
53
regression, the textbook example for modeling count data, as an illustration
56
-
for using the `{epipredict}` package with other existing `{tidymodels}` packages.
54
+
for using the `epipredict` package with other existing `tidymodels` packages.
57
55
58
56
The (folded) code below gives the necessary commands to download this data
59
-
from the Delphi Epidata API, but it is also built into the
0 commit comments