landing page wording and get code running

nmdefries · nmdefries · commit 0ba2d8a040d8 · 2025-02-28T16:25:02.000-05:00
diff --git a/README.md b/README.md
@@ -8,27 +8,28 @@
 [![R-CMD-check](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml)
 <!-- badges: end -->
 
-Epipredict is a framework for building transformation and forecasting
+`{epipredict}` is a framework for building transformation and forecasting
 pipelines for epidemiological and other panel time-series datasets. In
 addition to tools for building forecasting pipelines, it contains a
 number of “canned” forecasters meant to run with little modification as
 an easy way to get started forecasting.
 
 It is designed to work well with
-[`epiprocess`](https://cmu-delphi.github.io/epiprocess/), a utility for
-handling various time series and geographic processing tools in an
+[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/), a utility for
+time series handling and geographic processing in an
 epidemiological context. Both of the packages are meant to work well
 with the panel data provided by
-[`epidatr`](https://cmu-delphi.github.io/epidatr/).
+[`{epidatr}`](https://cmu-delphi.github.io/epidatr/).
+Pre-compiled example datasets are also availalbe in [`{epidatasets}`](https://cmu-delphi.github.io/epidatasets/).
 
-If you are looking for more detail beyond the package documentation, see
-our [forecasting
-book](https://cmu-delphi.github.io/delphi-tooling-book/).
+If you are looking for detail beyond the package documentation, see
+our [forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/).
 
 ## Installation
 
-To install (unless you’re planning on contributing to package
-development, we suggest using the stable version):
+Unless you’re planning on contributing to package
+development, we suggest using the stable version.
+To install, run:
 
 ``` r
 # Stable version
@@ -44,25 +45,32 @@ is at <https://cmu-delphi.github.io/epipredict/dev>.
 
 ## Motivating example
 
-To demonstrate the kind of forecast epipredict can make, say we’re
-predicting COVID deaths per 100k for each state on
+To demonstrate the kind of forecast `{epipredict}` can make, say we want to
+predict COVID-19 deaths per 100k people for each state on 2021-08-01.
 
 ``` r
+library(epipredict)
+library(epidatr)
+library(epiprocess)
+library(dplyr)
+library(ggplot2)
+
 forecast_date <- as.Date("2021-08-01")
 ```
 
 Below the fold, we construct this dataset as an `epiprocess::epi_df`
-from JHU data.
+from [Johns Hopkins Center for Systems Science and Engineering deaths data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html).
 
 <details>
 <summary>
 Creating the dataset using `{epidatr}` and `{epiprocess}`
 </summary>
 
-This dataset can be found in the package as `covid_case_death_rates`; we
-demonstrate some of the typically ubiquitous cleaning operations needed
-to be able to forecast. First we pull both jhu-csse cases and deaths
-from [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
+This section is intended to demonstrate some of the ubiquitous cleaning operations needed
+to be able to forecast.
+The dataset prepared here is also included ready-to-go in `{epipredict}` as `covid_case_death_rates`.
+
+First we pull both `jhu-csse` cases and deaths data from the [Delphi API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html) using the [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
 
 ``` r
 cases <- pub_covidcast(
@@ -87,7 +95,7 @@ deaths <- pub_covidcast(
 ```
 
 Since visualizing the results on every geography is somewhat
-overwhelming, we’ll only train on a subset of 5.
+overwhelming, we’ll only train on a subset of locations.
 
 ``` r
 used_locations <- c("ca", "ma", "ny", "tx")
@@ -113,12 +121,11 @@ cases_deaths |>
 
 <img src="man/figures/README-date-1.png" width="90%" style="display: block; margin: auto;" />
 
-As with basically any dataset, there is some cleaning that we will need
-to do to make it actually usable; we’ll use some utilities from
+As with the typical dataset, we will need to do some cleaning to make it actually usable; we’ll use some utilities from
 [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
 
-First, to eliminate some of the noise coming from daily reporting, we do
-7 day averaging over a trailing window[^1]:
+First, to reduce the noise from daily reporting, we will compute a
+7 day average over a trailing window[^1]:
 
 ``` r
 cases_deaths <-
@@ -134,7 +141,7 @@ cases_deaths <-
   rename(case_rate = cases_7dav, death_rate = death_rate_7dav)
 ```
 
-Then trimming outliers, most especially negative values:
+Then we'll trim outliers, especially negative values:
 
 ``` r
 cases_deaths <-
@@ -161,24 +168,25 @@ cases_deaths <-
 
 </details>
 
-After having downloaded and cleaned the data in `cases_deaths`, we plot
-a subset of the states, noting the actual forecast date:
+After downloading and cleaning the cases and deaths data, we can plot
+a subset of the states, marking the desired forecast date:
 
 <details>
 <summary>
 Plot
 </summary>
 
 ``` r
+used_locations <- c("ca", "ma", "ny", "tx")
 forecast_date_label <-
   tibble(
     geo_value = rep(used_locations, 2),
     .response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
     dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
     heights = c(rep(150, 4), rep(0.75, 4))
   )
-processed_data_plot <-
-  covid_case_death_rates |>
+
+covid_case_death_rates |>
   filter(geo_value %in% used_locations) |>
   autoplot(
     case_rate,
@@ -204,13 +212,13 @@ processed_data_plot <-
 
 <img src="man/figures/README-show-processed-data-1.png" width="90%" style="display: block; margin: auto;" />
 
-To make a forecast, we will use a “canned” simple auto-regressive
+To make a forecast, we will use a simple “canned” auto-regressive
 forecaster to predict the death rate four weeks into the future using
-lagged[^2] deaths and cases
+lagged[^2] deaths and cases.
 
 ``` r
 four_week_ahead <- arx_forecaster(
-  cases_deaths |> filter(time_value <= forecast_date),
+  covid_case_death_rates |> filter(time_value <= forecast_date),
   outcome = "death_rate",
   predictors = c("case_rate", "death_rate"),
   args_list = arx_args_list(
@@ -221,31 +229,31 @@ four_week_ahead <- arx_forecaster(
 )
 four_week_ahead
 #> ══ A basic forecaster of type ARX Forecaster ════════════════════════════════
-#> 
+#>
 #> This forecaster was fit on 2025-02-10 12:09:58.
-#> 
+#>
 #> Training data was an <epi_df> with:
 #> • Geography: state,
 #> • Time type: day,
 #> • Using data up-to-date as of: 2022-01-01.
 #> • With the last data available on 2021-08-01
-#> 
+#>
 #> ── Predictions ──────────────────────────────────────────────────────────────
-#> 
+#>
 #> A total of 4 predictions are available for
 #> • 4 unique geographic regions,
 #> • At forecast date: 2021-08-01,
 #> • For target date: 2021-08-29,
-#> 
+#>
 ```
 
-In this case, we have used 0-3 days, a week, and two week lags for the
-case rate, while using only zero, one and two weekly lags for the death
-rate (as predictors). The result `four_week_ahead` is both a fitted
+In our model setup, we are defining as our predictors case rate lagged 0-3 days, one week, and two weeks, and death rate lagged 0-2 weeks.
+The result `four_week_ahead` is both a fitted
 model object which could be used any time in the future to create
-different forecasts, as well as a set of predicted values (and
+different forecasts, and a set of predicted values (and
 prediction intervals) for each location 28 days after the forecast date.
-Plotting the prediction intervals on our subset above[^3]:
+
+Plotting the prediction intervals on the true values for our location subset[^3]:
 
 <details>
 <summary>
@@ -275,28 +283,29 @@ forecast_plot <-
 
 <img src="man/figures/README-show-single-forecast-1.png" width="90%" style="display: block; margin: auto;" />
 
-And as a tibble of quantile level-value pairs:
+And as a tibble of quantile-value pairs:
 
 ``` r
 four_week_ahead$predictions |>
   select(-.pred) |>
   pivot_quantiles_longer(.pred_distn)
 #> # A tibble: 20 × 5
 #>   geo_value values quantile_levels forecast_date target_date
-#>   <chr>      <dbl>           <dbl> <date>        <date>     
-#> 1 ca        0.199             0.1  2021-08-01    2021-08-29 
-#> 2 ca        0.285             0.25 2021-08-01    2021-08-29 
-#> 3 ca        0.345             0.5  2021-08-01    2021-08-29 
-#> 4 ca        0.405             0.75 2021-08-01    2021-08-29 
-#> 5 ca        0.491             0.9  2021-08-01    2021-08-29 
-#> 6 ma        0.0285            0.1  2021-08-01    2021-08-29 
+#>   <chr>      <dbl>           <dbl> <date>        <date>
+#> 1 ca        0.199             0.1  2021-08-01    2021-08-29
+#> 2 ca        0.285             0.25 2021-08-01    2021-08-29
+#> 3 ca        0.345             0.5  2021-08-01    2021-08-29
+#> 4 ca        0.405             0.75 2021-08-01    2021-08-29
+#> 5 ca        0.491             0.9  2021-08-01    2021-08-29
+#> 6 ma        0.0285            0.1  2021-08-01    2021-08-29
 #> # ℹ 14 more rows
 ```
 
-The black dot gives the median prediction, while the blue intervals give
+The orange dot gives the predicted median, and the blue intervals give
 the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^4]. For
 this particular day and these locations, the forecasts are relatively
-accurate, with the true data being at least within the 10-90% interval.
+accurate, with the true data being at worst within the 10-90% interval.
+
 A couple of things to note:
 
 1.  Our methods are primarily direct forecasters; this means we don’t
@@ -310,12 +319,11 @@ A couple of things to note:
 ## Getting Help
 
 If you encounter a bug or have a feature request, feel free to file an
-[issue on our github
+[issue on our GitHub
 page](https://github.com/cmu-delphi/epipredict/issues). For other
 questions, feel free to reach out to the authors, either via this
-[contact
-form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
-email, or the Insightnet slack.
+[contact form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
+email, or the InsightNet Slack.
 
 [^1]: This makes it so that any given day of the processed time-series
     only depends on the previous week, which means that we avoid leaking