cmu-delphi
diff --git a/‎_freeze/epipredict/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/epipredict/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/flatline-forecaster/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/flatline-forecaster/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/forecast-framework/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/forecast-framework/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/preprocessing-and-models/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/preprocessing-and-models/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/sliding-forecasters/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/sliding-forecasters/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/tidymodels-intro/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/tidymodels-intro/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/tidymodels-regression/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/tidymodels-regression/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎epipredict.qmd
Lines changed: 16 additions & 16 deletions b/‎epipredict.qmd
Lines changed: 16 additions & 16 deletions
diff --git a/‎flatline-forecaster.qmd
Lines changed: 5 additions & 5 deletions b/‎flatline-forecaster.qmd
Lines changed: 5 additions & 5 deletions
diff --git a/‎forecast-framework.qmd
Lines changed: 1 addition & 1 deletion b/‎forecast-framework.qmd
Lines changed: 1 addition & 1 deletion
diff --git a/‎preprocessing-and-models.qmd
Lines changed: 16 additions & 15 deletions b/‎preprocessing-and-models.qmd
Lines changed: 16 additions & 15 deletions
diff --git a/‎sliding-forecasters.qmd
Lines changed: 4 additions & 3 deletions b/‎sliding-forecasters.qmd
Lines changed: 4 additions & 3 deletions
@@ -10,21 +10,22 @@ At a high level, our goal with `{epipredict}` is to make running simple machine
 Serving both populations is the main motivation for our efforts, but at the same time, we have tried hard to make it useful.
 
 
-## Baseline models
+## Canned forecasters
 
-We provide a set of basic, easy-to-use forecasters that work out of the box. 
-You should be able to do a reasonably limited amount of customization on them. Any serious customization happens with the framework discussed below.
+We provide a set of basic, easy-to-use forecasters that work out of the box: 
 
-For the basic forecasters, we provide: 
-    
 * Flatline (basic) forecaster 
 * Autoregressive forecaster
 * Autoregressive classifier
 * Smooth autoregressive(AR) forecaster
 
-All the forcasters we provide are built on our framework. So we will use these basic models to illustrate its flexibility.
+These forecasters encapsulate a series of operations (including data preprocessing, model fitting and etc.) all in instant one-liners. 
+They are basically alternatives to each other. The main difference is the use of different models. Three forecasters use different regression models and the other one use a classification model. 
+
+The operations within canned forecasters all follow our uniform **framework**.  
+Although these one-liners allow a reasonably limited amount of customization, to uncover any serious customization you need more knowledge on our framework explained in @sec-framework. 
 
-## Forecasting framework
+## Forecasting framework {#sec-framework}
 
 At its core, `{epipredict}` is a **framework** for creating custom forecasters.
 By that we mean that we view the process of creating custom forecasters as
@@ -47,8 +48,7 @@ Therefore, if you want something from this -verse, it should "just work" (we hop
 The reason for the overlap is that `{workflows}` _already implements_ the first 
 three steps. And it does this very well. However, it is missing the 
 postprocessing stage and currently has no plans for such an implementation. 
-And this feature is important. The baseline forecaster we provide _requires_
-postprocessing. Anything more complicated (which is nearly everything) 
+And this feature is important. All forecasters need post-processing. Anything more complicated (which is nearly everything) 
 needs this as well.
 
 The second omission from `{tidymodels}` is support for panel data. Besides
@@ -64,14 +64,14 @@ into an `epi_df` as described in @sec-additional-keys.
 
 ## Why doesn't this package already exist?
 
--   Parts of it actually DO exist. There's a universe called `tidymodels`. It 
+-   Parts of it actually DO exist. There's a universe called `{tidymodels}`. It 
 handles pre-processing, training, and prediction, bound together, through a 
-package called workflows. We built `epipredict` on top of that setup. In this 
+package called workflows. We built `{epipredict}` on top of that setup. In this 
 way, you CAN use almost everything they provide.
 -   However, workflows doesn't do post-processing to the extent envisioned here.
-And nothing in `tidymodels` handles panel data.
+And nothing in `{tidymodels}` handles panel data.
 -   The tidy-team doesn't have plans to do either of these things. (We checked).
--   There are two packages that do time series built on `tidymodels`, but it's 
+-   There are two packages that do time series built on `{tidymodels}`, but it's 
 "basic" time series: 1-step AR models, exponential smoothing, STL decomposition,
 etc.[^1] 
 
@@ -101,7 +101,7 @@ out <- arx_forecaster(
 )
 ```
 
-This call produces a warning, which we'll ignore for now. But essentially, it's telling us that our data comes from May 2022 but we're trying to do a forecast for January 2022. The result is likely not an accurate measure of real-time forecast performance, because the data have been revised over time. 
+This call produces a warning, which we'll ignore for now. But essentially, it's telling us that our data comes from May 2022 but we're trying to do a forecast for January 2022. The result is likely not an accurate measure of real-time forecast performance, because the data has been revised over time. 
 
 ```{r}
 out
@@ -115,7 +115,7 @@ of what the predictions are for. It contains three main components:
 ```{r}
 str(out$metadata)
 ```
-2. The predictions in a tibble. The columns give the predictions for each location along with additional columns. By default, these are a 90% predictive interval, the `forecast_date` (the date on which the forecast was putatively made) and the `target_date` (the date for which the forecast is being made).
+2. The predictions in a tibble. The columns give the predictions for each location along with additional columns. By default, these are a 90% prediction interval, the `forecast_date` (the date on which the forecast was putatively made) and the `target_date` (the date for which the forecast is being made).
 ```{r}
 out$predictions
 ```
@@ -159,7 +159,7 @@ likely increase the variance of the model, and therefore, may lead to less
 accurate forecasts for the variable of interest.
 
 
-Another property of the basic model is the predictive interval. We describe this in more detail in a coming chapter, but it is easy to request multiple quantiles.
+Another property of the basic model is the prediction interval. We describe this in more detail in a coming chapter, but it is easy to request multiple quantiles.
 
 ```{r differential-levels}
 out_q <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
 
@@ -1,6 +1,6 @@
 # Introducing the flatline forecaster
 
-The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The predictive intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
+The flatline forecaster is a very simple forecasting model intended for `epi_df` data, where the most recent observation is used as the forecast for any future date. In other words, the last observation is propagated forward. Hence, a flat line phenomenon is observed for the point predictions. The prediction intervals are produced from the quantiles of the residuals of such a forecast over all of the training data. By default, these intervals will be obtained separately for each combination of keys (`geo_value` and any additional keys) in the `epi_df`. Thus, the output is a data frame of point (and optionally interval) forecasts at a single unique horizon (`ahead`) for each unique combination of key variables. This forecaster is comparable to the baseline used by the [COVID Forecast Hub](https://covid19forecasthub.org).
 
 ## Example of using the flatline forecaster
 
@@ -55,8 +55,8 @@ five_days_ahead <- flatline_forecaster(
 five_days_ahead
 ```
 
-We could also specify that we want a 80% predictive interval by changing the 
-levels. The default 0.05 and 0.95 levels/quantiles give us 90% predictive 
+We could also specify that we want a 80% prediction interval by changing the 
+levels. The default 0.05 and 0.95 levels/quantiles give us 90% prediction 
 interval.
 
 ```{r}
@@ -117,15 +117,15 @@ extract_frosting(five_days_ahead$epi_workflow)
 ```
 
 
-The post-processing operations in the order that were performed were to create the predictions and the predictive intervals, add the forecast and target dates and bound the predictions at zero.
+The post-processing operations in the order that were performed were to create the predictions and the prediction intervals, add the forecast and target dates and bound the predictions at zero.
 
 We can also easily examine the predictions themselves.
 
 ```{r}
 five_days_ahead$predictions
 ```
 
-The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% predictive interval is available for every state (`geo_value`).
+The results above show a distributional forecast produced using data through the end of 2021 for the January 5, 2022. A prediction for the death rate per 100K inhabitants along with a 95% prediction interval is available for every state (`geo_value`).
 
 The figure below displays the prediction and prediction interval for three sample states: Arizona, New York, and Florida.
 
 
@@ -89,7 +89,7 @@ er <- epi_recipe(jhu) %>%
 ```
 
 While `{recipes}` provides a function `step_lag()`, it assumes that the data
-have no breaks in the sequence of `time_values`. This is a bit dangerous, so
+has no breaks in the sequence of `time_values`. This is a bit dangerous, so
 we avoid that behaviour. Our `lag/ahead` functions also appropriately adjust the
 amount of data to avoid accidentally dropping recent predictors from the test
 data.
 
@@ -2,26 +2,27 @@
 
 ```{r}
 #| echo: false
+#| warning: false
 source("_common.R")
 ```
 
 
 ## Introduction 
 
-The `epipredict` package uses the `tidymodels` framework, namely 
+The `{epipredict}` package uses the `{tidymodels}` framework, namely 
 [`{recipes}`](https://recipes.tidymodels.org/) for 
 [dplyr](https://dplyr.tidyverse.org/)-like pipeable sequences 
 of feature engineering and [`{parsnip}`](https://parsnip.tidymodels.org/) 
 for a unified interface to a range of models. 
 
-`epipredict` has additional customized feature engineering and preprocessing 
+`{epipredict}` has additional customized feature engineering and preprocessing 
 steps that specifically work with panel data in this context, for example,
 `step_epi_lag()`, `step_population_scaling()`, 
 `step_epi_naomit()`. They can be used along with most
 steps from the `{recipes}` package for more feature engineering. 
 
-In this vignette, we will illustrate some examples of how to use `epipredict`
-with `recipes` and `parsnip` for different purposes of 
+In this vignette, we will illustrate some examples of how to use `{epipredict}`
+with `{recipes}` and `{parsnip}` for different purposes of 
 epidemiological forecasting.
 We will focus on basic autoregressive models, in which COVID cases and 
 deaths in the near future are predicted using a linear combination of cases
@@ -52,7 +53,7 @@ deploying control measures.
 One of the outcomes that the CDC forecasts is [death counts from COVID-19](https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html).
 Although there are many state-of-the-art models, we choose to use Poisson 
 regression, the textbook example for modeling count data, as an illustration
-for using the `epipredict` package with other existing `{tidymodels}` packages. 
+for using the `{epipredict}` package with other existing `{tidymodels}` packages. 
 
 The (folded) code below gives the necessary commands to download this data
 from the Delphi Epidata API, but it is also built into the
@@ -112,13 +113,13 @@ $s_{\text{state}}$ are dummy variables for each state and take values of either
 0 or 1.
 
 Preprocessing steps will be performed to prepare the
-data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `recipes` framework. 
+data for model fitting. But before diving into them, it will be helpful to understand what `roles` are in the `{recipes}` framework. 
 
 ---
 
-#### Aside on `recipes` {.unnumbered}
+#### Aside on `{recipes}` {.unnumbered}
 
-`recipes` can assign one or more roles to each column in the data. The roles 
+`{recipes}` can assign one or more roles to each column in the data. The roles 
 are not restricted to a predefined set; they can be anything. 
 For most conventional situations, they are typically “predictor” and/or 
 "outcome". Additional roles enable targeted `step_*()` operations on specific 
@@ -132,7 +133,7 @@ that are unique to the `epipredict` package. Since we work with `epi_df`
 objects, all datasets should have `geo_value` and `time_value` passed through
 automatically with these two roles assigned to the appropriate columns in the data.
 
-The `recipes` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html) 
+The `{recipes}` package also allows [manual alterations of roles](https://recipes.tidymodels.org/reference/roles.html) 
 in bulk. There are a few handy functions that can be used together to help us 
 manipulate variable roles easily. 
 
@@ -170,7 +171,7 @@ r <- epi_recipe(counts_subset) %>%
   step_epi_naomit()
 ```
 
-After specifying the preprocessing steps, we will use the `parsnip` package for
+After specifying the preprocessing steps, we will use the `{parsnip}` package for
 modeling and producing the prediction for death count, 7 days after the
 latest available date in the dataset. 
 
@@ -206,8 +207,8 @@ However, the Delphi Group preferred to train on rate data instead, because it
 puts different locations on a similar scale (eliminating the need for location-specific intercepts). 
 We can use a linear regression to predict the death rates and use state
 population data to scale the rates to counts.[^pois] We will do so using
-`layer_population_scaling()` from the `epipredict` package. (We could also use
-`step_population_scaling()` from the `epipredict` package to prepare rate data
+`layer_population_scaling()` from the `{epipredict}` package. (We could also use
+`step_population_scaling()` from the `{epipredict}` package to prepare rate data
 from count data in the preprocessing recipe.)
 
 [^pois]: We could continue with the Poisson model, but we'll switch to the Gaussian likelihood just for simplicity.
@@ -295,9 +296,9 @@ jhu <- filter(
 )
 ```
 
-Preprocessing steps will again rely on functions from the `epipredict` package 
-as well as the `recipes` package.
-There are also many functions in the `recipes` package that allow for 
+Preprocessing steps will again rely on functions from the `{epipredict}` package 
+as well as the `{recipes}` package.
+There are also many functions in the `{recipes}` package that allow for 
 [scalar transformations](https://recipes.tidymodels.org/reference/#step-functions-individual-transformations),
 such as log transformations and data centering. In our case, we will 
 center the numerical predictors to allow for a more meaningful interpretation of
 
@@ -2,13 +2,14 @@
 
 ```{r}
 #| echo: false
+#| warning: false
 source("_common.R")
 ```
 
 
 A key function from the epiprocess package is `epi_slide()`, which allows the
 user to apply a function or formula-based computation over variables in an
-`epi_df` over a running window of `n` time steps (see the following `epiprocess`
+`epi_df` over a running window of `n` time steps (see the following `{epiprocess}`
 vignette to go over the basics of the function: ["Slide a computation over
 signal values"](https://cmu-delphi.github.io/epiprocess/articles/slide.html)).
 The equivalent sliding method for an `epi_archive` object can be called by using
@@ -149,13 +150,13 @@ model.[^1]
 
 ### Example using case data from Canada
 
-By leveraging the flexibility of `epiprocess`, we can apply the same techniques
+By leveraging the flexibility of `{epiprocess}`, we can apply the same techniques
 to data from other sources. Since some collaborators are in British Columbia,
 Canada, we'll do essentially the same thing for Canada as we did above.
 
 The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects
 daily time series data on COVID-19 cases, deaths, recoveries, testing and
-vaccinations at the health region and province levels. Data are collected from
+vaccinations at the health region and province levels. Data is collected from
 publicly available sources such as government datasets and news releases.
 Unfortunately, there is no simple versioned source, so we have created our own
 from the Github commit history.