Skip to content

Commit b4c03eb

Browse files
authored
Merge pull request #4 from cmu-delphi/rach/update-attrib
Rach/update attrib
2 parents a7c1f3b + d52884e commit b4c03eb

File tree

92 files changed

+76149
-13858
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+76149
-13858
lines changed

DESCRIPTION

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,20 @@ Description:
1111
License: MIT + file LICENSE
1212
Imports:
1313
epidatr,
14-
epipredict,
14+
epidatasets,
15+
epipredict (>=0.0.5),
1516
epiprocess,
1617
modeldata,
18+
outbreaks,
1719
ranger,
20+
see,
1821
tidyverse,
1922
xgboost
2023
Suggests:
2124
quarto,
2225
styler
2326
Remotes:
2427
cmu-delphi/epidatr,
28+
cmu-delphi/epidatasets,
2529
cmu-delphi/epiprocess@dev,
2630
cmu-delphi/epipredict

_common.R

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,17 @@
11
# example R options set globally
2-
options(width = 60)
2+
options(width = 77, cli.width = 77)
33

44
# example chunk options set globally
55
knitr::opts_chunk$set(
6+
digits = 3,
67
comment = "#>",
78
collapse = FALSE,
89
cache = TRUE,
910
warning = FALSE,
1011
message = FALSE,
1112
echo = TRUE,
1213
dev = "svg",
14+
dev.args = list(bg = "transparent"),
1315
dpi = 300,
1416
cache.lazy = FALSE,
1517
tidy = "styler",
@@ -20,6 +22,10 @@ knitr::opts_chunk$set(
2022
)
2123

2224
suppressPackageStartupMessages(library(tidyverse))
25+
suppressPackageStartupMessages(library(epiprocess))
26+
suppressPackageStartupMessages(library(epipredict))
27+
suppressPackageStartupMessages(library(epidatr))
28+
suppressPackageStartupMessages(library(epidatasets))
2329

2430
options(
2531
dplyr.print_min = 6,
@@ -28,8 +34,8 @@ options(
2834
pillar.min_chars = 15,
2935
stringr.view_n = 6,
3036
# Temporarily deactivate cli output for quarto
31-
cli.num_colors = 0,
32-
cli.hyperlink = FALSE,
37+
#cli.num_colors = 0,
38+
#cli.hyperlink = FALSE,
3339
pillar.bold = TRUE,
3440
width = 77
3541
)

_freeze/archive/execute-results/html.json

Lines changed: 14 additions & 0 deletions
Large diffs are not rendered by default.

_freeze/archive/figure-html/unnamed-chunk-9-1.svg

Lines changed: 1461 additions & 0 deletions
Loading
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"hash": "0691bfcf21a9d75e573a517c8ca641c0",
3+
"result": {
4+
"markdown": "# Correlate signals over space and time\n\n\nThe `epiprocess` package provides some simple functionality for computing lagged\ncorrelations between two signals, over space or time (or other variables), via\n`epi_cor()`. This function is really just a convenience wrapper over some basic\ncommands: it first performs specified time shifts, then computes correlations,\ngrouped in a specified way. In this vignette, we'll examine correlations between\nstate-level COVID-19 case and death rates, smoothed using 7-day trailing\naverages.\n\n\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-2_7c89b5aee8f9d474319d212978503566'}\n\n```{.r .cell-code}\nx <- covid_case_death_rates\n```\n:::\n\n\n## Correlations grouped by time\n\nThe `epi_cor()` function operates on an `epi_df` object, and it requires further\nspecification of the variables to correlate, in its next two arguments (`var1`\nand `var2`).\n\nIn general, we can specify any grouping variable (or combination of variables)\nfor the correlation computations in a call to `epi_cor()`, via the `cor_by`\nargument. This potentially leads to many ways to compute correlations. There are\nalways at least two ways to compute correlations in an `epi_df`: grouping by\ntime value, and by geo value. The former is obtained via `cor_by = time_value`.\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-3_f66ac47a93717e593807c2fa24766523'}\n\n```{.r .cell-code}\nz1 <- epi_cor(x, case_rate, death_rate, cor_by = \"time_value\")\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-4_84e1aa4831754809c364c0c6200c23eb'}\n\n```{.r .cell-code code-fold=\"true\"}\nggplot(z1, aes(x = time_value, y = cor)) +\n geom_hline(yintercept = 0) +\n geom_line(color = 4) +\n scale_x_date(minor_breaks = \"month\", date_labels = \"%b %Y\") +\n labs(x = \"Date\", y = \"Correlation\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-4-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nThe above plot addresses the question: \"on any given day, are case and death\nrates linearly associated, across the U.S. states?\". We might be interested in\nbroadening this question, instead asking: \"on any given day, do higher case\nrates tend to associate with higher death rates?\", removing the dependence on a\nlinear relationship. The latter can be addressed using Spearman correlation,\naccomplished by setting `method = \"spearman\"` in the call to `epi_cor()`.\nSpearman correlation is highly robust and invariant to monotone transformations.\n\n## Lagged correlations\n\nWe might also be interested in how case rates associate with death rates in the\n*future*. Using the `dt1` parameter in `epi_cor()`, we can lag case rates back\nany number of days we want, before calculating correlations. Below, we set `dt1\n= -10`. This means that `var1 = case_rate` will be lagged by 10 days, so that\ncase rates on June 1st will be correlated with death rates on June 11th. (It\nmight also help to think of it this way: death rates on a certain day will be\ncorrelated with case rates at an offset of -10 days.)\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-5_a31d30ca4ff11cc4173e87d7b3e17516'}\n\n```{.r .cell-code}\nz2 <- epi_cor(x, case_rate, death_rate, cor_by = time_value, dt1 = -10)\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-6_17b2540d7afa4303c809aa7836eb3729'}\n\n```{.r .cell-code code-fold=\"true\"}\nz <- rbind(\n z1 %>% mutate(lag = 0),\n z2 %>% mutate(lag = 10)\n) %>%\n mutate(lag = as.factor(lag))\n\nggplot(z, aes(x = time_value, y = cor)) +\n geom_hline(yintercept = 0) +\n geom_line(aes(color = lag)) +\n scale_color_brewer(palette = \"Set1\") +\n scale_x_date(minor_breaks = \"month\", date_labels = \"%b %Y\") +\n labs(x = \"Date\", y = \"Correlation\", col = \"Lag\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-6-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nNote that `epi_cor()` takes an argument `shift_by` that determines the grouping\nto use for the time shifts. The default is `geo_value`, which makes sense in our\nproblem at hand (but in another setting, we may want to group by geo value and\nanother variable---say, age---before time shifting).\n\nWe can see that, generally, lagging the case rates back by 10 days improves the\ncorrelations, confirming case rates are better correlated with death rates 10\ndays from now.\n\n## Correlations grouped by state\n\nThe second option we have is to group by geo value, obtained by setting `cor_by\n= geo_value`. We'll again look at correlations for both 0- and 10-day lagged \ncase rates.\n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-7_caf710aef16ca5c027a450b5e8a09798'}\n\n```{.r .cell-code}\nz1 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value)\nz2 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -10)\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-8_410fa1f85cdea4abca883f8cec2a5879'}\n\n```{.r .cell-code code-fold=\"true\"}\nz <- rbind(\n z1 %>% mutate(lag = 0),\n z2 %>% mutate(lag = 10)\n) %>%\n mutate(lag = as.factor(lag))\n\nggplot(z, aes(cor)) +\n geom_density(aes(fill = lag, col = lag), alpha = 0.5, bounds = c(-1, 1)) +\n scale_fill_brewer(palette = \"Set1\") +\n scale_color_brewer(palette = \"Set1\") +\n labs(x = \"Correlation\", y = \"Density\", fill = \"Lag\", col = \"Lag\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-8-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nWe can again see that, generally speaking, lagging the case rates back by 10 \ndays improves the correlations.\n\n## More systematic lag analysis\n\nNext we perform a more systematic investigation of the correlations over a broad \nrange of lag values. \n\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-9_505146d4a43638cd8f42f1d37a772a50'}\n\n```{.r .cell-code}\nlags <- 0:35\n\nz <- map(\n .x = lags,\n ~ epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -.x) %>%\n mutate(lag = .x)\n) %>% list_rbind()\n```\n:::\n\n::: {.cell layout-align=\"center\" hash='correlations_cache/html/unnamed-chunk-10_e5d3435d433ff118e91d8001cb930845'}\n\n```{.r .cell-code code-fold=\"true\"}\nz %>%\n group_by(lag) %>%\n summarize(mean = mean(cor, na.rm = TRUE)) %>%\n ggplot(aes(x = lag, y = mean)) +\n geom_line(color = 4) +\n geom_point(color = 4) +\n labs(x = \"Lag\", y = \"Mean correlation\")\n```\n\n::: {.cell-output-display}\n![](correlations_files/figure-html/unnamed-chunk-10-1.svg){fig-align='center' width=90%}\n:::\n:::\n\n\nWe can see pronounced curvature in the average correlation between \ncase and death rates (where the correlations come from grouping by `geo_value`) as\na function of lag. The maximum occurs at a lag of somewhere around 17 days.\n",
5+
"supporting": [],
6+
"filters": [
7+
"rmarkdown/pagebreak.lua"
8+
],
9+
"includes": {},
10+
"engineDependencies": {},
11+
"preserve": {},
12+
"postProcess": true
13+
}
14+
}

0 commit comments

Comments
 (0)