@@ -58,6 +58,8 @@ claims and the number of new confirmed COVID-19 cases per 100,000 population
58
58
59
59
<details >
60
60
61
+ <summary >Load a data archive</summary >
62
+
61
63
We process as before, with the
62
64
modification that we use ` sync = locf ` in ` epix_merge() ` so that the last
63
65
version of each observation can be carried forward to extrapolate unavailable
@@ -89,7 +91,7 @@ Note that all of the warnings about the forecast date being less than the most
89
91
recent update date of the data have been suppressed to avoid cluttering the
90
92
output.
91
93
92
- ``` {r make- arx-kweek, warning = FALSE}
94
+ ``` {r arx-kweek-preliminaries , warning = FALSE}
93
95
# Latest snapshot of data, and forecast dates
94
96
x_latest <- epix_as_of(x, max_version = max(x$versions_end))
95
97
fc_time_values <- seq(
@@ -105,8 +107,7 @@ k_week_ahead <- function(epi_df, outcome, predictors, ahead = 7, engine) {
105
107
~ arx_forecaster(
106
108
.x, outcome, predictors, engine,
107
109
args_list = arx_args_list(ahead = ahead)
108
- ) %>%
109
- extract2("predictions") %>%
110
+ )$predictions %>%
110
111
select(-geo_value),
111
112
before = 120 - 1,
112
113
ref_time_values = fc_time_values,
@@ -115,7 +116,9 @@ k_week_ahead <- function(epi_df, outcome, predictors, ahead = 7, engine) {
115
116
select(geo_value, time_value, starts_with("fc")) %>%
116
117
mutate(engine_type = engine$engine)
117
118
}
119
+ ```
118
120
121
+ ``` {r make-arx-kweek}
119
122
# Generate the forecasts and bind them together
120
123
fc <- bind_rows(
121
124
map(
@@ -146,11 +149,14 @@ rates. Note that even though we've fitted the model on all states, we'll just
146
149
display the results for two states, California (CA) and Florida (FL), to get a
147
150
sense of the model performance while keeping the graphic simple.
148
151
149
- ``` {r plot-arx, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 6}
152
+ <details >
153
+
154
+ <summary >Code for plotting</summary >
155
+ ``` {r plot-arx, message = FALSE, warning = FALSE}
150
156
fc_cafl <- fc %>% filter(geo_value %in% c("ca", "fl"))
151
157
x_latest_cafl <- x_latest %>% filter(geo_value %in% c("ca", "fl"))
152
158
153
- ggplot(fc_cafl, aes(fc_target_date, group = time_value, fill = engine_type)) +
159
+ p1 <- ggplot(fc_cafl, aes(fc_target_date, group = time_value, fill = engine_type)) +
154
160
geom_line(
155
161
data = x_latest_cafl, aes(x = time_value, y = case_rate),
156
162
inherit.aes = FALSE, color = "gray50"
@@ -165,6 +171,11 @@ ggplot(fc_cafl, aes(fc_target_date, group = time_value, fill = engine_type)) +
165
171
labs(x = "Date", y = "Reported COVID-19 case rates") +
166
172
theme(legend.position = "none")
167
173
```
174
+ </details >
175
+
176
+ ``` {r show-plot1, fig.width = 9, fig.height = 6, echo=FALSE}
177
+ p1
178
+ ```
168
179
169
180
For the two states of interest, simple linear regression clearly performs better
170
181
than random forest in terms of accuracy of the predictions and does not result
@@ -185,6 +196,8 @@ to those models high variance predictions.
185
196
186
197
<details >
187
198
199
+ <summary >Data and forecasts. Similar to the above.</summary >
200
+
188
201
By leveraging the flexibility of ` epiprocess ` , we can apply the same techniques
189
202
to data from other sources. Since some collaborators are in British Columbia,
190
203
Canada, we'll do essentially the same thing for Canada as we did above.
@@ -312,6 +325,7 @@ combined data from all US states and territories) to train our model.
312
325
313
326
<details >
314
327
328
+ <summary >Download data using `{epidatr}`</summary >
315
329
``` {r load-data, eval=FALSE}
316
330
# loading in the data
317
331
states <- "*"
@@ -343,12 +357,6 @@ deaths_incidence_prop <- pub_covidcast(
343
357
as_epi_archive(compactify = FALSE)
344
358
345
359
346
- fc_time_values <- seq(
347
- from = as.Date("2020-09-01"),
348
- to = as.Date("2021-12-31"),
349
- by = "1 month"
350
- )
351
-
352
360
x <- epix_merge(confirmed_incidence_prop, deaths_incidence_prop,
353
361
sync = "locf"
354
362
)
@@ -380,18 +388,21 @@ x <- x %>%
380
388
saveRDS(x$DT, file = "case_death_rate_archive.rds")
381
389
```
382
390
383
- ``` {r load-stored-data, eval=FALSE }
391
+ ``` {r load-stored-data}
384
392
x <- readRDS("case_death_rate_archive.rds")
385
393
x <- as_epi_archive(x)
386
394
```
387
-
388
-
389
395
</details >
390
396
391
397
Here we specify the ARX model.
392
398
393
399
``` {r make-arx-model}
394
400
aheads <- c(7, 14, 21)
401
+ fc_time_values <- seq(
402
+ from = as.Date("2020-09-01"),
403
+ to = as.Date("2021-12-31"),
404
+ by = "1 month"
405
+ )
395
406
forecaster <- function(x) {
396
407
map(aheads, function(ahead) {
397
408
arx_forecaster(
@@ -408,34 +419,37 @@ forecaster <- function(x) {
408
419
409
420
We can now use our forecaster function that we've created and use it in the
410
421
pipeline for forecasting the predictions. We store the predictions into the
411
- ` x_result ` variable and calculate the most up to date version of the data in the
412
- epiarchive and store it as ` x_latest ` .
422
+ ` arx_preds ` variable and calculate the most up to date version of the data in the
423
+ epi archive and store it as ` x_latest ` .
413
424
414
425
``` {r running-arx-forecaster}
415
- x_result <- x %>%
426
+ arx_preds <- x %>%
416
427
epix_slide(~ forecaster(.x),
417
428
before = 120, ref_time_values = fc_time_values,
418
429
names_sep = NULL
419
430
) %>%
420
431
mutate(engine_type = quantile_reg()$engine) %>%
421
- as_epi_df()
422
- x_result$ ahead_val <- x_result$ target_date - x_result$ forecast_date
432
+ as_epi_df() %>%
433
+ mutate( ahead_val = target_date - forecast_date)
423
434
424
435
x_latest <- epix_as_of(x, max_version = max(x$versions_end))
425
436
```
426
437
427
438
Now we plot both the actual and predicted 7 day average of the death rate for
428
439
the chosen states
429
440
430
- ``` {r plot-arx-asof, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 6}
441
+ <details >
442
+
443
+ <summary >Code for the plot</summary >
444
+ ``` {r plot-arx-asof, message = FALSE, warning = FALSE}
431
445
states_to_show <- c("ca", "ny", "mi", "az")
432
- fc_states <- x_result %>%
446
+ fc_states <- arx_preds %>%
433
447
filter(geo_value %in% states_to_show) %>%
434
448
pivot_quantiles_wider(.pred_distn)
435
449
436
450
x_latest_states <- x_latest %>% filter(geo_value %in% states_to_show)
437
451
438
- ggplot(fc_states, aes(target_date, group = time_value)) +
452
+ p2 <- ggplot(fc_states, aes(target_date, group = time_value)) +
439
453
geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`, fill = geo_value), alpha = 0.4) +
440
454
geom_line(
441
455
data = x_latest_states, aes(x = time_value, y = death_rate_7d_av),
@@ -451,3 +465,8 @@ ggplot(fc_states, aes(target_date, group = time_value)) +
451
465
labs(x = "Date", y = "7 day average COVID-19 death rates") +
452
466
theme(legend.position = "none")
453
467
```
468
+ </details >
469
+
470
+ ``` {r show-plot2, fig.width = 9, fig.height = 6, echo = FALSE}
471
+ p2
472
+ ```
0 commit comments