Skip to content

Commit ba96a81

Browse files
committed
dropping CAN backtesting example b/c ~no revisions
1 parent 3834887 commit ba96a81

File tree

1 file changed

+0
-123
lines changed

1 file changed

+0
-123
lines changed

vignettes/backtesting.Rmd

Lines changed: 0 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -390,129 +390,6 @@ can expect in practice.
390390
p2
391391
```
392392

393-
## Comparing version faithful and faithless in Canada
394-
395-
396-
By leveraging the flexibility of `{epipredict}`, we can apply the same
397-
techniques to data from other sources.
398-
Since some collaborators are in British Columbia, Canada, we'll do essentially
399-
the same thing for Canadian Provincial data as we did above.
400-
401-
The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects
402-
daily time series data on COVID-19 cases, deaths, recoveries, testing and
403-
vaccinations at the health region and province levels.
404-
Data are collected from publicly available sources such as government datasets
405-
and news releases.
406-
Unfortunately, there is no simple versioned source, so we have created our own
407-
from the Github commit history and stored it in `epidatasets::can_prov_cases`.
408-
409-
First, we load versioned case rates at the provincial level.
410-
411-
```{r get-can-fc, warning = FALSE}
412-
aheads <- c(7, 14, 21, 28)
413-
canada_archive <- epidatasets::can_prov_cases
414-
revis_can <- epidatasets::can_prov_cases %>% revision_summary()
415-
revis_can %>% group_by(geo_value) %>% summarize(n_revisions = mean(n_revisions)) %>% print(n=13)
416-
canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
417-
mutate(version = time_value) %>%
418-
as_epi_archive()
419-
```
420-
421-
We run more or less the same forecasting method as above, but with the addition
422-
of 7-day averaging for each snapshot before forecasting (due to highly variable
423-
provincial reporting mismatches).
424-
425-
```{r smoothing, warning = FALSE}
426-
smooth_cases <- function(epi_df) {
427-
epi_df %>%
428-
group_by(geo_value) %>%
429-
epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
430-
}
431-
forecast_dates <- seq.Date(
432-
from = min(canada_archive$DT$version),
433-
to = max(canada_archive$DT$version),
434-
by = "1 month"
435-
)
436-
437-
canada_version_faithless <-
438-
canada_archive_faux %>%
439-
epix_slide(
440-
~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
441-
.before = 120,
442-
.versions = forecast_dates
443-
) %>%
444-
mutate(version_faithful = FALSE)
445-
canada_version_faithful <-
446-
canada_archive %>%
447-
epix_slide(
448-
~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
449-
.before = 120,
450-
.versions = forecast_dates
451-
) %>%
452-
mutate(version_faithful = TRUE)
453-
canada_forecasts <- bind_rows(
454-
canada_version_faithless,
455-
canada_version_faithful
456-
)
457-
```
458-
459-
The figures below shows the results for a single province.
460-
<details>
461-
<summary> Plotting </summary>
462-
First prepping some data to make plotting more informative.
463-
```{r plot-can-fc-lr-data, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
464-
geo_choose <- "Saskatchewan"
465-
forecasts_filtered <- canada_forecasts %>%
466-
filter(geo_value == geo_choose) %>%
467-
mutate(time_value = version)
468-
case_rate_data <- bind_rows(
469-
# Snapshotted data for the version-faithful forecasts
470-
map(
471-
forecast_dates,
472-
~ canada_archive %>%
473-
epix_as_of(.x) %>%
474-
smooth_cases() %>%
475-
mutate(case_rate = case_rate_7dav, version = .x)
476-
) %>%
477-
bind_rows() %>%
478-
mutate(version_faithful = TRUE),
479-
# Latest data for the version-faithless forecasts
480-
canada_archive %>%
481-
epix_as_of(doctor_visits$versions_end) %>%
482-
smooth_cases() %>%
483-
mutate(case_rate = case_rate_7dav, version_faithful = FALSE)
484-
) %>%
485-
filter(geo_value == geo_choose)
486-
case_rate_data %>% filter(time_value == "2021-01-13") %>% print(n=12)
487-
canada_archive %>% revision_summary()
488-
```
489-
And actually generating the plot
490-
```{r plot-can-fc-lr, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
491-
p3 <-
492-
ggplot(data = forecasts_filtered, aes(x = target_date, group = time_value)) +
493-
geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`, fill = factor(time_value)), alpha = 0.4) +
494-
geom_line(aes(y = .pred, color = factor(time_value)), linetype = 2L) +
495-
geom_point(aes(y = .pred, color = factor(time_value)), size = 0.75) +
496-
# the forecast date
497-
geom_vline(data = case_rate_data %>% filter(geo_value == geo_choose) %>% select(-version_faithful), aes(color = factor(version), xintercept = version), lty = 2) +
498-
# the underlying data
499-
geom_line(
500-
data = case_rate_data %>% filter(geo_value == geo_choose),
501-
aes(x = time_value, y = case_rate, color = factor(version)),
502-
inherit.aes = FALSE, na.rm = TRUE
503-
) +
504-
facet_grid(version_faithful ~ geo_value, scales = "free") +
505-
scale_x_date(breaks = "2 months", date_labels = "%b %Y") +
506-
scale_y_continuous(expand = expansion(c(0, 0.05))) +
507-
labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
508-
theme(legend.position = "none")
509-
```
510-
</details>
511-
512-
```{r show-can-plot, warning = FALSE, echo = FALSE}
513-
p3
514-
```
515-
516393

517394
[^1]: For forecasting a single day like this, we could have actually just used
518395
`doctor_visits %>% epix_as_of(forecast_date)` to get the relevant snapshot, and then fed that into `arx_forecaster()` as we did in the [landing

0 commit comments

Comments
 (0)