You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting time lags of observations/functions-of-observations as of particular version lags, to:
Evaluate accuracy of real-time/bleeding-edge data
Prepare apples-to-apples training data for revision-aware forecasters
Summarizing statistics regarding latency of data sources, missingness/zero-ness/duplications in bleeding-edge data.
Note that:
For pseudoprospective forecasting, we typically want multiple rows per group-reftime, as we may be predicting multiple targets, multiple quantiles, etc. These multiple rows are based on new key columns introduced in the slide computation (e.g., the target date and quantile level), not the old non-grouping parts of the epikey (e.g., the geo_value if we are fitting all geos simultaneously). We might want to output a different epikey set if we are performing/tacking on a geo/age/etc. aggregation.
For extracting version&time-lags of functions of data: we probably want either (a) exactly 1 or (b) either 0 or 1 row per group-reftime (with multiple columns for different functions, lags, etc.). We might want to output a different epikey set if we are performing a geo/age/etc. aggregation.
For summary statistics, we might want to output any number of rows, depending on the analysis.
Except maybe 2(a) with no epikey aggregation, we don't want the 1-per-reftime-epikey or 1-per-epikey-broadcasted-to-epikeys behavior.
So, we should make epix_slide more reframe/group_modify-like than grouped-mutate-like. (Its output column behavior is already like the former; this would focus on row handling.)
Allow the computations to output any number of rows. Do not perform row broadcasting. Be careful to remove descriptions of size stability and exact forecast compatibility with epi_slide if not true.
Adjust or deprecate all_rows for epix_slide.
Check that returning computation results with new key columns (e.g., target date, quantile level, ...) result in an appropriately-keyed output, or consider a parameter to express the introduction of new key variables. [punted]
The text was updated successfully, but these errors were encountered:
Migrated from #64.
Foreseen common use cases of
epix_slide
:Note that:
geo_value
if we are fitting all geos simultaneously). We might want to output a different epikey set if we are performing/tacking on a geo/age/etc. aggregation.Except maybe 2(a) with no epikey aggregation, we don't want the 1-per-reftime-epikey or 1-per-epikey-broadcasted-to-epikeys behavior.
So, we should make
epix_slide
morereframe
/group_modify
-like than grouped-mutate
-like. (Its output column behavior is already like the former; this would focus on row handling.)epi_slide
if not true.Check that returning computation results with new key columns (e.g., target date, quantile level, ...) result in an appropriately-keyed output, or consider a parameter to express the introduction of new key variables.[punted]The text was updated successfully, but these errors were encountered: