Skip to content

Change epix_slide to be more group_modify-like #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
brookslogan opened this issue Mar 9, 2023 · 1 comment
Closed
2 tasks done

Change epix_slide to be more group_modify-like #275

brookslogan opened this issue Mar 9, 2023 · 1 comment
Labels
op-semantics Operational semantics; many potentially breaking changes here P1 medium priority

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Mar 9, 2023

Migrated from #64.

Foreseen common use cases of epix_slide:

  • Pseudo-prospective forecasting/pancasting/modeling.
  • Getting time lags of observations/functions-of-observations as of particular version lags, to:
    • Evaluate accuracy of real-time/bleeding-edge data
    • Prepare apples-to-apples training data for revision-aware forecasters
  • Summarizing statistics regarding latency of data sources, missingness/zero-ness/duplications in bleeding-edge data.

Note that:

  1. For pseudoprospective forecasting, we typically want multiple rows per group-reftime, as we may be predicting multiple targets, multiple quantiles, etc. These multiple rows are based on new key columns introduced in the slide computation (e.g., the target date and quantile level), not the old non-grouping parts of the epikey (e.g., the geo_value if we are fitting all geos simultaneously). We might want to output a different epikey set if we are performing/tacking on a geo/age/etc. aggregation.
  2. For extracting version&time-lags of functions of data: we probably want either (a) exactly 1 or (b) either 0 or 1 row per group-reftime (with multiple columns for different functions, lags, etc.). We might want to output a different epikey set if we are performing a geo/age/etc. aggregation.
  3. For summary statistics, we might want to output any number of rows, depending on the analysis.

Except maybe 2(a) with no epikey aggregation, we don't want the 1-per-reftime-epikey or 1-per-epikey-broadcasted-to-epikeys behavior.

So, we should make epix_slide more reframe/group_modify-like than grouped-mutate-like. (Its output column behavior is already like the former; this would focus on row handling.)

  • Allow the computations to output any number of rows. Do not perform row broadcasting. Be careful to remove descriptions of size stability and exact forecast compatibility with epi_slide if not true.
  • Adjust or deprecate all_rows for epix_slide.
  • Check that returning computation results with new key columns (e.g., target date, quantile level, ...) result in an appropriately-keyed output, or consider a parameter to express the introduction of new key variables. [punted]
@brookslogan brookslogan added P1 medium priority op-semantics Operational semantics; many potentially breaking changes here labels Mar 9, 2023
@brookslogan brookslogan changed the title Change epix_slide to be more reframe-like Change epix_slide to be more group_modify-like May 5, 2023
@brookslogan
Copy link
Contributor Author

Closed by #311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
op-semantics Operational semantics; many potentially breaking changes here P1 medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant