Skip to content

Change epix_slide to be more group_modify-like #275

Closed
@brookslogan

Description

@brookslogan

Migrated from #64.

Foreseen common use cases of epix_slide:

  • Pseudo-prospective forecasting/pancasting/modeling.
  • Getting time lags of observations/functions-of-observations as of particular version lags, to:
    • Evaluate accuracy of real-time/bleeding-edge data
    • Prepare apples-to-apples training data for revision-aware forecasters
  • Summarizing statistics regarding latency of data sources, missingness/zero-ness/duplications in bleeding-edge data.

Note that:

  1. For pseudoprospective forecasting, we typically want multiple rows per group-reftime, as we may be predicting multiple targets, multiple quantiles, etc. These multiple rows are based on new key columns introduced in the slide computation (e.g., the target date and quantile level), not the old non-grouping parts of the epikey (e.g., the geo_value if we are fitting all geos simultaneously). We might want to output a different epikey set if we are performing/tacking on a geo/age/etc. aggregation.
  2. For extracting version&time-lags of functions of data: we probably want either (a) exactly 1 or (b) either 0 or 1 row per group-reftime (with multiple columns for different functions, lags, etc.). We might want to output a different epikey set if we are performing a geo/age/etc. aggregation.
  3. For summary statistics, we might want to output any number of rows, depending on the analysis.

Except maybe 2(a) with no epikey aggregation, we don't want the 1-per-reftime-epikey or 1-per-epikey-broadcasted-to-epikeys behavior.

So, we should make epix_slide more reframe/group_modify-like than grouped-mutate-like. (Its output column behavior is already like the former; this would focus on row handling.)

  • Allow the computations to output any number of rows. Do not perform row broadcasting. Be careful to remove descriptions of size stability and exact forecast compatibility with epi_slide if not true.
  • Adjust or deprecate all_rows for epix_slide.
  • Check that returning computation results with new key columns (e.g., target date, quantile level, ...) result in an appropriately-keyed output, or consider a parameter to express the introduction of new key variables. [punted]

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1medium priorityop-semanticsOperational semantics; many potentially breaking changes here

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions