@@ -390,129 +390,6 @@ can expect in practice.
390390p2
391391```
392392
393- ## Comparing version faithful and faithless in Canada
394-
395-
396- By leveraging the flexibility of ` {epipredict} ` , we can apply the same
397- techniques to data from other sources.
398- Since some collaborators are in British Columbia, Canada, we'll do essentially
399- the same thing for Canadian Provincial data as we did above.
400-
401- The [ COVID-19 Canada Open Data Working Group] ( https://opencovid.ca/ ) collects
402- daily time series data on COVID-19 cases, deaths, recoveries, testing and
403- vaccinations at the health region and province levels.
404- Data are collected from publicly available sources such as government datasets
405- and news releases.
406- Unfortunately, there is no simple versioned source, so we have created our own
407- from the Github commit history and stored it in ` epidatasets::can_prov_cases ` .
408-
409- First, we load versioned case rates at the provincial level.
410-
411- ``` {r get-can-fc, warning = FALSE}
412- aheads <- c(7, 14, 21, 28)
413- canada_archive <- epidatasets::can_prov_cases
414- revis_can <- epidatasets::can_prov_cases %>% revision_summary()
415- revis_can %>% group_by(geo_value) %>% summarize(n_revisions = mean(n_revisions)) %>% print(n=13)
416- canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
417- mutate(version = time_value) %>%
418- as_epi_archive()
419- ```
420-
421- We run more or less the same forecasting method as above, but with the addition
422- of 7-day averaging for each snapshot before forecasting (due to highly variable
423- provincial reporting mismatches).
424-
425- ``` {r smoothing, warning = FALSE}
426- smooth_cases <- function(epi_df) {
427- epi_df %>%
428- group_by(geo_value) %>%
429- epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
430- }
431- forecast_dates <- seq.Date(
432- from = min(canada_archive$DT$version),
433- to = max(canada_archive$DT$version),
434- by = "1 month"
435- )
436-
437- canada_version_faithless <-
438- canada_archive_faux %>%
439- epix_slide(
440- ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
441- .before = 120,
442- .versions = forecast_dates
443- ) %>%
444- mutate(version_faithful = FALSE)
445- canada_version_faithful <-
446- canada_archive %>%
447- epix_slide(
448- ~forecast_wrapper(.x, aheads, "case_rate_7dav", "case_rate_7dav", smooth_cases),
449- .before = 120,
450- .versions = forecast_dates
451- ) %>%
452- mutate(version_faithful = TRUE)
453- canada_forecasts <- bind_rows(
454- canada_version_faithless,
455- canada_version_faithful
456- )
457- ```
458-
459- The figures below shows the results for a single province.
460- <details >
461- <summary > Plotting </summary >
462- First prepping some data to make plotting more informative.
463- ``` {r plot-can-fc-lr-data, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
464- geo_choose <- "Saskatchewan"
465- forecasts_filtered <- canada_forecasts %>%
466- filter(geo_value == geo_choose) %>%
467- mutate(time_value = version)
468- case_rate_data <- bind_rows(
469- # Snapshotted data for the version-faithful forecasts
470- map(
471- forecast_dates,
472- ~ canada_archive %>%
473- epix_as_of(.x) %>%
474- smooth_cases() %>%
475- mutate(case_rate = case_rate_7dav, version = .x)
476- ) %>%
477- bind_rows() %>%
478- mutate(version_faithful = TRUE),
479- # Latest data for the version-faithless forecasts
480- canada_archive %>%
481- epix_as_of(doctor_visits$versions_end) %>%
482- smooth_cases() %>%
483- mutate(case_rate = case_rate_7dav, version_faithful = FALSE)
484- ) %>%
485- filter(geo_value == geo_choose)
486- case_rate_data %>% filter(time_value == "2021-01-13") %>% print(n=12)
487- canada_archive %>% revision_summary()
488- ```
489- And actually generating the plot
490- ``` {r plot-can-fc-lr, message = FALSE, warning = FALSE, fig.width = 9, fig.height = 12}
491- p3 <-
492- ggplot(data = forecasts_filtered, aes(x = target_date, group = time_value)) +
493- geom_ribbon(aes(ymin = `0.05`, ymax = `0.95`, fill = factor(time_value)), alpha = 0.4) +
494- geom_line(aes(y = .pred, color = factor(time_value)), linetype = 2L) +
495- geom_point(aes(y = .pred, color = factor(time_value)), size = 0.75) +
496- # the forecast date
497- geom_vline(data = case_rate_data %>% filter(geo_value == geo_choose) %>% select(-version_faithful), aes(color = factor(version), xintercept = version), lty = 2) +
498- # the underlying data
499- geom_line(
500- data = case_rate_data %>% filter(geo_value == geo_choose),
501- aes(x = time_value, y = case_rate, color = factor(version)),
502- inherit.aes = FALSE, na.rm = TRUE
503- ) +
504- facet_grid(version_faithful ~ geo_value, scales = "free") +
505- scale_x_date(breaks = "2 months", date_labels = "%b %Y") +
506- scale_y_continuous(expand = expansion(c(0, 0.05))) +
507- labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
508- theme(legend.position = "none")
509- ```
510- </details >
511-
512- ``` {r show-can-plot, warning = FALSE, echo = FALSE}
513- p3
514- ```
515-
516393
517394[ ^ 1 ] : For forecasting a single day like this, we could have actually just used
518395 ` doctor_visits %>% epix_as_of(forecast_date) ` to get the relevant snapshot, and then fed that into ` arx_forecaster() ` as we did in the [ landing
0 commit comments