cmu-delphi
diff --git a/‎DESCRIPTION
Lines changed: 5 additions & 1 deletion b/‎DESCRIPTION
Lines changed: 5 additions & 1 deletion
diff --git a/‎_common.R
Lines changed: 9 additions & 3 deletions b/‎_common.R
Lines changed: 9 additions & 3 deletions
diff --git a/‎_freeze/archive/execute-results/html.json
Lines changed: 14 additions & 0 deletions b/‎_freeze/archive/execute-results/html.json
Lines changed: 14 additions & 0 deletions
diff --git a/‎_freeze/archive/figure-html/unnamed-chunk-9-1.svg
Lines changed: 1461 additions & 0 deletions b/‎_freeze/archive/figure-html/unnamed-chunk-9-1.svg
Lines changed: 1461 additions & 0 deletions
diff --git a/‎_freeze/correlations/execute-results/html.json
Lines changed: 14 additions & 0 deletions b/‎_freeze/correlations/execute-results/html.json
Lines changed: 14 additions & 0 deletions
@@ -11,16 +11,20 @@ Description:
 License: MIT + file LICENSE
 Imports:
     epidatr,
-    epipredict,
+    epidatasets,
+    epipredict (>=0.0.5),
     epiprocess,
     modeldata,
+    outbreaks,
     ranger,
+    see,
     tidyverse,
     xgboost
 Suggests:
     quarto,
     styler
 Remotes:
     cmu-delphi/epidatr,
+    cmu-delphi/epidatasets,
     cmu-delphi/epiprocess@dev,
     cmu-delphi/epipredict
@@ -1,15 +1,17 @@
 # example R options set globally
-options(width = 60)
+options(width = 77, cli.width = 77)
 
 # example chunk options set globally
 knitr::opts_chunk$set(
+  digits = 3,
   comment = "#>",
   collapse = FALSE,
   cache = TRUE,
   warning = FALSE,
   message = FALSE,
   echo = TRUE,
   dev = "svg",
+  dev.args = list(bg = "transparent"),
   dpi = 300,
   cache.lazy = FALSE,
   tidy = "styler",
@@ -20,6 +22,10 @@ knitr::opts_chunk$set(
 )
 
 suppressPackageStartupMessages(library(tidyverse))
+suppressPackageStartupMessages(library(epiprocess))
+suppressPackageStartupMessages(library(epipredict))
+suppressPackageStartupMessages(library(epidatr))
+suppressPackageStartupMessages(library(epidatasets))
 
 options(
   dplyr.print_min = 6,
@@ -28,8 +34,8 @@ options(
   pillar.min_chars = 15,
   stringr.view_n = 6,
   # Temporarily deactivate cli output for quarto
-  cli.num_colors = 0,
-  cli.hyperlink = FALSE,
+  #cli.num_colors = 0,
+  #cli.hyperlink = FALSE,
   pillar.bold = TRUE,
   width = 77
 )
 
@@ -0,0 +1,14 @@
+{
+  "hash": "0691bfcf21a9d75e573a517c8ca641c0",
+  "result": {
+    "markdown": "# Correlate signals over space and time\n\n\nThe `epiprocess` package provides some simple functionality for computing lagged\ncorrelations between two signals, over space or time (or other variables), via\n`epi_cor()`. This function is really just a convenience wrapper over some basic\ncommands: it first performs specified time shifts, then computes correlations,\ngrouped in a specified way. In this vignette, we'll examine correlations between\nstate-level COVID-19 case and death rates, smoothed using 7-day trailing\naverages.\n\n\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-2_7c89b5aee8f9d474319d212978503566'}\n\n```{.r .cell-code}\nx <- covid_case_death_rates\n```\n:::\n\n\n## Correlations grouped by time\n\nThe `epi_cor()` function operates on an `epi_df` object, and it requires further\nspecification of the variables to correlate, in its next two arguments (`var1`\nand `var2`).\n\nIn general, we can specify any grouping variable (or combination of variables)\nfor the correlation computations in a call to `epi_cor()`, via the `cor_by`\nargument. This potentially leads to many ways to compute correlations. There are\nalways at least two ways to compute correlations in an `epi_df`: grouping by\ntime value, and by geo value. The former is obtained via `cor_by = time_value`.\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-3_f66ac47a93717e593807c2fa24766523'}\n\n```{.r .cell-code}\nz1 <- epi_cor(x, case_rate, death_rate, cor_by = \"time_value\")\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-4_84e1aa4831754809c364c0c6200c23eb'}\n\n```{.r .cell-code  code-fold=\"true\"}\nggplot(z1, aes(x = time_value, y = cor)) +\n  geom_hline(yintercept = 0) +\n  geom_line(color = 4) +\n  scale_x_date(minor_breaks = \"month\", date_labels = \"%b %Y\") +\n  labs(x = \"Date\", y = \"Correlation\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-4-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nThe above plot addresses the question: \"on any given day, are case and death\nrates linearly associated, across the U.S. states?\". We might be interested in\nbroadening this question, instead asking: \"on any given day, do higher case\nrates tend to associate with higher death rates?\", removing the dependence on a\nlinear relationship. The latter can be addressed using Spearman correlation,\naccomplished by setting `method = \"spearman\"` in the call to `epi_cor()`.\nSpearman correlation is highly robust and invariant to monotone transformations.\n\n## Lagged correlations\n\nWe might also be interested in how case rates associate with death rates in the\n*future*. Using the `dt1` parameter in `epi_cor()`, we can lag case rates back\nany number of days we want, before calculating correlations. Below, we set `dt1\n= -10`. This means that `var1 = case_rate` will be lagged by 10 days, so that\ncase rates on June 1st will be correlated with death rates on June 11th.  (It\nmight also help to think of it this way: death rates on a certain day will be\ncorrelated with case rates at an offset of -10 days.)\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-5_a31d30ca4ff11cc4173e87d7b3e17516'}\n\n```{.r .cell-code}\nz2 <- epi_cor(x, case_rate, death_rate, cor_by = time_value, dt1 = -10)\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-6_17b2540d7afa4303c809aa7836eb3729'}\n\n```{.r .cell-code  code-fold=\"true\"}\nz <- rbind(\n  z1 %>% mutate(lag = 0),\n  z2 %>% mutate(lag = 10)\n) %>%\n  mutate(lag = as.factor(lag))\n\nggplot(z, aes(x = time_value, y = cor)) +\n  geom_hline(yintercept = 0) +\n  geom_line(aes(color = lag)) +\n  scale_color_brewer(palette = \"Set1\") +\n  scale_x_date(minor_breaks = \"month\", date_labels = \"%b %Y\") +\n  labs(x = \"Date\", y = \"Correlation\", col = \"Lag\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-6-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nNote that `epi_cor()` takes an argument `shift_by` that determines the grouping\nto use for the time shifts. The default is `geo_value`, which makes sense in our\nproblem at hand (but in another setting, we may want to group by geo value and\nanother variable---say, age---before time shifting).\n\nWe can see that, generally, lagging the case rates back by 10 days improves the\ncorrelations, confirming case rates are better correlated with death rates 10\ndays from now.\n\n## Correlations grouped by state\n\nThe second option we have is to group by geo value, obtained by setting `cor_by\n= geo_value`. We'll again look at correlations for both 0- and 10-day lagged \ncase rates.\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-7_caf710aef16ca5c027a450b5e8a09798'}\n\n```{.r .cell-code}\nz1 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value)\nz2 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -10)\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-8_410fa1f85cdea4abca883f8cec2a5879'}\n\n```{.r .cell-code  code-fold=\"true\"}\nz <- rbind(\n  z1 %>% mutate(lag = 0),\n  z2 %>% mutate(lag = 10)\n) %>%\n  mutate(lag = as.factor(lag))\n\nggplot(z, aes(cor)) +\n  geom_density(aes(fill = lag, col = lag), alpha = 0.5, bounds = c(-1, 1)) +\n  scale_fill_brewer(palette = \"Set1\") +\n  scale_color_brewer(palette = \"Set1\") +\n  labs(x = \"Correlation\", y = \"Density\", fill = \"Lag\", col = \"Lag\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-8-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nWe can again see that, generally speaking, lagging the case rates back by 10 \ndays improves the correlations.\n\n## More systematic lag analysis\n\nNext we perform a more systematic investigation of the correlations over a broad \nrange of lag values. \n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-9_505146d4a43638cd8f42f1d37a772a50'}\n\n```{.r .cell-code}\nlags <- 0:35\n\nz <- map(\n  .x = lags,\n  ~ epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -.x) %>%\n    mutate(lag = .x)\n) %>% list_rbind()\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-10_e5d3435d433ff118e91d8001cb930845'}\n\n```{.r .cell-code  code-fold=\"true\"}\nz %>%\n  group_by(lag) %>%\n  summarize(mean = mean(cor, na.rm = TRUE)) %>%\n  ggplot(aes(x = lag, y = mean)) +\n  geom_line(color = 4) +\n  geom_point(color = 4) +\n  labs(x = \"Lag\", y = \"Mean correlation\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-10-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nWe can see pronounced curvature in the average correlation between \ncase and death rates (where the correlations come from grouping by `geo_value`) as\na function of lag. The maximum occurs at a lag of somewhere around 17 days.\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}