cmu-delphi
diff --git a/‎_freeze/epipredict/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/epipredict/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/flatline-forecaster/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/flatline-forecaster/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/forecast-framework/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/forecast-framework/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/preprocessing-and-models/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/preprocessing-and-models/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/sliding-forecasters/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/sliding-forecasters/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/tidymodels-intro/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/tidymodels-intro/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/tidymodels-regression/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/tidymodels-regression/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_quarto.yml
Lines changed: 1 addition & 1 deletion b/‎_quarto.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎epipredict.qmd
Lines changed: 13 additions & 12 deletions b/‎epipredict.qmd
Lines changed: 13 additions & 12 deletions
diff --git a/‎flatline-forecaster.qmd
Lines changed: 3 additions & 2 deletions b/‎flatline-forecaster.qmd
Lines changed: 3 additions & 2 deletions
diff --git a/‎forecast-framework.qmd
Lines changed: 8 additions & 7 deletions b/‎forecast-framework.qmd
Lines changed: 8 additions & 7 deletions
diff --git a/‎preprocessing-and-models.qmd
Lines changed: 21 additions & 25 deletions b/‎preprocessing-and-models.qmd
Lines changed: 21 additions & 25 deletions
diff --git a/‎sliding-forecasters.qmd
Lines changed: 3 additions & 2 deletions b/‎sliding-forecasters.qmd
Lines changed: 3 additions & 2 deletions
@@ -54,4 +54,4 @@ format:
       sidebar-width: 400px
       body-width: 600px
     theme: [cosmo, delphi-epitools.scss]
-
+    code-link: true
@@ -39,39 +39,39 @@ There are four types of components:
 3. Predictor: make predictions, using a fitted model object and processed test data
 4. Postprocessor: manipulate or transform the predictions before returning
 
-Users familiar with [`{tidymodels}`](https://www.tidymodels.org) and especially 
-the [`{workflows}`](https://workflows.tidymodels.org) package will notice a lot 
+Users familiar with `{tidymodels}` and especially 
+the `{workflows}` package will notice a lot 
 of overlap. This is by design, and is in fact a feature. The truth is that
 `{epipredict}` is a wrapper around much that is contained in these packages.
 Therefore, if you want something from this -verse, it should "just work" (we hope).
 
-The reason for the overlap is that `{workflows}` _already implements_ the first 
+The reason for the overlap is that `workflows` _already implements_ the first 
 three steps. And it does this very well. However, it is missing the 
 postprocessing stage and currently has no plans for such an implementation. 
 And this feature is important. All forecasters need post-processing. Anything more complicated (which is nearly everything) 
 needs this as well.
 
-The second omission from `{tidymodels}` is support for panel data. Besides
+The second omission from `tidymodels` is support for panel data. Besides
 epidemiological data, economics, psychology, sociology, and many other areas
-frequently deal with data of this type. So the framework of behind `{epipredict}`
+frequently deal with data of this type. So the framework of behind `epipredict`
 implements this. In principle, this has nothing to do with epidemiology, and 
 one could simply use this package as a solution for the missing functionality in
-`{tidymodels}`. Again, this should "just work" (we hope).
+`tidymodels`. Again, this should "just work" (we hope).
 
 All of the _panel data_ functionality is implemented through the `epi_df` data type
 described in the previous part. If you have different panel data, just force it
 into an `epi_df` as described in @sec-additional-keys.
 
 ## Why doesn't this package already exist?
 
--   Parts of it actually DO exist. There's a universe called `{tidymodels}`. It 
+-   Parts of it actually DO exist. There's a universe called `tidymodels`. It 
 handles pre-processing, training, and prediction, bound together, through a 
-package called workflows. We built `{epipredict}` on top of that setup. In this 
+package called workflows. We built `epipredict` on top of that setup. In this 
 way, you CAN use almost everything they provide.
 -   However, workflows doesn't do post-processing to the extent envisioned here.
-And nothing in `{tidymodels}` handles panel data.
+And nothing in `tidymodels` handles panel data.
 -   The tidy-team doesn't have plans to do either of these things. (We checked).
--   There are two packages that do time series built on `{tidymodels}`, but it's 
+-   There are two packages that do time series built on `tidymodels`, but it's 
 "basic" time series: 1-step AR models, exponential smoothing, STL decomposition,
 etc.[^1] 
 
@@ -94,6 +94,7 @@ in the built-in data frame).
 jhu <- case_death_rate_subset %>% 
   filter(time_value >= max(time_value) - 30)
 
+library(epipredict)
 out <- arx_forecaster(
   jhu, 
   outcome = "death_rate",
@@ -128,7 +129,7 @@ By default, the forecaster predicts the outcome (`death_rate`) 1-week ahead,
 using 3 lags of each predictor (`case_rate` and `death_rate`) at 0 (today), 
 1 week back and 2 weeks back. The predictors and outcome can be changed 
 directly. The rest of the defaults are encapsulated into a list of arguments. 
-This list is produced by `arx_args_list()`.
+This list is produced by `arx_args_list()`. 
 
 ## Simple adjustments
 
@@ -197,7 +198,7 @@ arx_args_list(
 
 So far, our forecasts have been produced using simple linear regression. But this is not the only way to estimate such a model.
 The `trainer` argument determines the type of model we want. 
-This takes a [`{parsnip}`](https://parsnip.tidymodels.org) model. The default is linear regression, but we could instead use a random forest with the `{ranger}` package:
+This takes a `{parsnip}` model. The default is linear regression, but we could instead use a random forest with the `{ranger}` package:
 
 ```{r ranger, warning = FALSE}
 out_rf <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
 
@@ -13,14 +13,14 @@ source("_common.R")
 
 
 We will continue to use the `case_death_rate_subset` dataset that comes with the
-`epipredict` package. In brief, this is a subset of the JHU daily COVID-19 cases
+`{epipredict}` package. In brief, this is a subset of the JHU daily COVID-19 cases
 and deaths by state. While this dataset ranges from Dec 31, 2020 to Dec 31, 
 2021, we will only consider a small subset at the end of that range to keep our
 example relatively simple.
 
 ```{r}
 jhu <- case_death_rate_subset %>%
-  dplyr::filter(time_value >= as.Date("2021-09-01"))
+  filter(time_value >= as.Date("2021-09-01"))
 
 jhu
 ```
@@ -32,6 +32,7 @@ eath rate one week into the future, is to input the `epi_df` and the name of
 the column from it that we want to predict in the `flatline_forecaster` function.
 
 ```{r}
+library(epipredict)
 one_week_ahead <- flatline_forecaster(jhu, outcome = "death_rate")
 one_week_ahead
 ```
 
@@ -21,6 +21,7 @@ to examine the data and an estimated canned corecaster.
 
 
 ```{r demo-workflow}
+library(epipredict)
 jhu <- case_death_rate_subset %>% 
   filter(time_value >= max(time_value) - 30)
 
@@ -31,18 +32,18 @@ out_gb <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
 ## Preprocessing
 
 Preprocessing is accomplished through a `recipe` (imagine baking a cake) as 
-provided in the [`{recipes}`](https://recipes.tidymodels.org) package. 
+provided in the `{recipes}` package. 
 We've made a few modifications (to handle
 panel data) as well as added some additional options. The recipe gives a
 specification of how to handle training data. Think of it like a fancified
 `formula` that you would pass to `lm()`: `y ~ x1 + log(x2)`. In general, 
-there are 2 extensions to the `formula` that `{recipes}` handles: 
+there are 2 extensions to the `formula` that `recipes` handles: 
 
   1. Doing transformations of both training and test data that can always be 
   applied. These are things like taking the log of a variable, leading or 
   lagging, filtering out rows, handling dummy variables, etc.
   2. Using statistics from the training data to eventually process test data. 
-    This is a major benefit of `{recipes}`. It prevents what the tidy team calls
+    This is a major benefit of `recipes`. It prevents what the tidy team calls
     "data leakage". A simple example is centering a predictor by its mean. We
     need to store the mean of the predictor from the training data and use that
     value on the test data rather than accidentally calculating the mean of
@@ -88,7 +89,7 @@ er <- epi_recipe(jhu) %>%
   step_epi_naomit()
 ```
 
-While `{recipes}` provides a function `step_lag()`, it assumes that the data
+While `recipes` provides a function `step_lag()`, it assumes that the data
 has no breaks in the sequence of `time_values`. This is a bit dangerous, so
 we avoid that behaviour. Our `lag/ahead` functions also appropriately adjust the
 amount of data to avoid accidentally dropping recent predictors from the test
@@ -97,9 +98,9 @@ data.
 ## The model specification
 
 Users familiar with the `{parsnip}` package will have no trouble here.
-Basically, `{parsnip}` unifies the function signature across statistical models.
+Basically, `parsnip` unifies the function signature across statistical models.
 For example, `lm()` "likes" to work with formulas, but `glmnet::glmnet()` uses
-`x` and `y` for predictors and response. `{parsnip}` is agnostic. Both of these
+`x` and `y` for predictors and response. `parsnip` is agnostic. Both of these
 do "linear regression". Above we switched from `lm()` to `xgboost()` without 
 any issue despite the fact that these functions couldn't be more different.
 
@@ -109,7 +110,7 @@ lm(
   model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
   contrasts = NULL, offset, ...)
 
-xgboost(
+xgboost::xgboost(
   data = NULL, label = NULL, missing = NA, weight = NULL, 
   params = list(), nrounds, verbose = 1, print_every_n = 1L, 
   early_stopping_rounds = NULL, maximize = NULL, save_period = NULL, 
 
@@ -10,19 +10,16 @@ source("_common.R")
 ## Introduction 
 
 The `{epipredict}` package uses the `{tidymodels}` framework, namely 
-[`{recipes}`](https://recipes.tidymodels.org/) for 
-[dplyr](https://dplyr.tidyverse.org/)-like pipeable sequences 
-of feature engineering and [`{parsnip}`](https://parsnip.tidymodels.org/) 
-for a unified interface to a range of models. 
+`{recipes}` for `{dplyr}`-like pipeable sequences of feature engineering and `{parsnip}` for a unified interface to a range of models. 
 
-`{epipredict}` has additional customized feature engineering and preprocessing 
+`epipredict` has additional customized feature engineering and preprocessing 
 steps that specifically work with panel data in this context, for example,
 `step_epi_lag()`, `step_population_scaling()`, 
 `step_epi_naomit()`. They can be used along with most
-steps from the `{recipes}` package for more feature engineering. 
+steps from the `recipes` package for more feature engineering. 
 
-In this vignette, we will illustrate some examples of how to use `{epipredict}`
-with `{recipes}` and `{parsnip}` for different purposes of 
+In this vignette, we will illustrate some examples of how to use `epipredict`
+with `recipes` and `parsnip` for different purposes of 
 epidemiological forecasting.
 We will focus on basic autoregressive models, in which COVID cases and 
 deaths in the near future are predicted using a linear combination of cases
@@ -40,6 +37,7 @@ library(epipredict)
 library(recipes)
 library(workflows)
 library(poissonreg)
+library(epidatasets)
 ```
 
 ## Poisson Regression 
@@ -53,12 +51,10 @@ deploying control measures.
 One of the outcomes that the CDC forecasts is [death counts from COVID-19](https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html).
 Although there are many state-of-the-art models, we choose to use Poisson 
 regression, the textbook example for modeling count data, as an illustration
-for using the `{epipredict}` package with other existing `{tidymodels}` packages. 
+for using the `epipredict` package with other existing `tidymodels` packages. 
 
 The (folded) code below gives the necessary commands to download this data
-from the Delphi Epidata API, but it is also built into the
-[`{epidatasets}`](https://cmu-delphi.github.io/epidatasets/reference/counts_subset.html)
-package.
+`counts_subset` from the Delphi Epidata API, but it is also built into the `{epidatasets}` package.
 
 ```{r poisson-reg-data}
 #| eval: false
@@ -92,7 +88,7 @@ counts_subset <- full_join(x, y, by = c("geo_value", "time_value")) %>%
 data(counts_subset, package = "epidatasets")
 ```
 
-The `counts_subset` dataset
+The `epidatasets::counts_subset` dataset
 contains the number of confirmed cases and deaths from June 4, 2021 to 
 Dec 31, 2021 in some U.S. states. 
 
@@ -113,11 +109,11 @@ $s_{\text{state}}$ are dummy variables for each state and take values of either
 0 or 1.
 
 Preprocessing steps will be performed to prepare the
-data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `{recipes}` framework. 
+data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `recipes` framework. 
 
 ---
 
-#### Aside on `{recipes}` {.unnumbered}
+#### Aside on `recipes` {.unnumbered}
 
 `{recipes}` can assign one or more roles to each column in the data. The roles 
 are not restricted to a predefined set; they can be anything. 
@@ -133,7 +129,7 @@ that are unique to the `epipredict` package. Since we work with `epi_df`
 objects, all datasets should have `geo_value` and `time_value` passed through
 automatically with these two roles assigned to the appropriate columns in the data.
 
-The `{recipes}` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html) 
+The `recipes` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html) 
 in bulk. There are a few handy functions that can be used together to help us 
 manipulate variable roles easily. 
 
@@ -194,8 +190,8 @@ extract_fit_engine(wf)
 ```
 
 Alternative forms of Poisson regression or particular computational approaches
-can be applied via arguments to `parsnip::poisson_reg()` for some common
-settings, and by using `parsnip::set_engine()` to use a specific Poisson
+can be applied via arguments to `poisson_reg()` for some common
+settings, and by using `set_engine()` to use a specific Poisson
 regression engine and to provide additional engine-specific customization.
 
 
@@ -207,8 +203,8 @@ However, the Delphi Group preferred to train on rate data instead, because it
 puts different locations on a similar scale (eliminating the need for location-specific intercepts). 
 We can use a linear regression to predict the death rates and use state
 population data to scale the rates to counts.[^pois] We will do so using
-`layer_population_scaling()` from the `{epipredict}` package. (We could also use
-`step_population_scaling()` from the `{epipredict}` package to prepare rate data
+`layer_population_scaling()` from the `epipredict` package. (We could also use
+`step_population_scaling()` to prepare rate data
 from count data in the preprocessing recipe.)
 
 [^pois]: We could continue with the Poisson model, but we'll switch to the Gaussian likelihood just for simplicity.
@@ -263,7 +259,7 @@ pop_dat <- state_census %>% select(abbr, pop)
 ```
 
 State-wise population data from the 2019 U.S. Census is
-available from `{epipredict}` and will be used in `layer_population_scaling()`.
+available from `epipredict` and will be used in `layer_population_scaling()`. 
 
 
 
@@ -296,9 +292,9 @@ jhu <- filter(
 )
 ```
 
-Preprocessing steps will again rely on functions from the `{epipredict}` package 
-as well as the `{recipes}` package.
-There are also many functions in the `{recipes}` package that allow for 
+Preprocessing steps will again rely on functions from the `epipredict` package 
+as well as the `recipes` package.
+There are also many functions in the `recipes` package that allow for 
 [scalar transformations](https://recipes.tidymodels.org/reference/#step-functions-individual-transformations),
 such as log transformations and data centering. In our case, we will 
 center the numerical predictors to allow for a more meaningful interpretation of
@@ -437,7 +433,7 @@ $$
 
 Preprocessing steps are similar to the previous models with an additional step 
 of categorizing the response variables. Again, we will use a subset of death rate and case rate data from our built-in dataset 
-`case_death_rate_subset`.
+`epipredict::case_death_rate_subset`.
 ```{r}
 jhu_rates <- case_death_rate_subset %>%
   dplyr::filter(
 
@@ -9,7 +9,7 @@ source("_common.R")
 
 A key function from the epiprocess package is `epi_slide()`, which allows the
 user to apply a function or formula-based computation over variables in an
-`epi_df` over a running window of `n` time steps (see the following `{epiprocess}`
+`epi_df` over a running window of `n` time steps (see the following `epiprocess`
 vignette to go over the basics of the function: ["Slide a computation over
 signal values"](https://cmu-delphi.github.io/epiprocess/articles/slide.html)).
 The equivalent sliding method for an `epi_archive` object can be called by using
@@ -41,6 +41,7 @@ version of each observation can be carried forward to extrapolate unavailable
 versions for the less up-to-date input archive.
 
 ```{r grab-epi-data}
+library(epipredict)
 us_raw_history_dfs <-
   readRDS(system.file("extdata", "all_states_covidcast_signals.rds",
                       package = "epipredict", mustWork = TRUE))
@@ -242,7 +243,7 @@ ggplot(can_fc %>% filter(engine_type == "xgboost"),
 Both approaches tend to produce quite volatile forecasts (point predictions)
 and/or are overly confident (very narrow bands), particularly when boosted
 regression trees are used. But as this is meant to be a simple demonstration of
-sliding with different engines in `arx_forecaster`, we may devote another
+sliding with different engines in `arx_forecaster()`, we may devote another
 vignette to work on improving the predictive modelling using the suite of tools
 available in epipredict.