-
Notifications
You must be signed in to change notification settings - Fork 8
Add filter.epi_archive
#651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
11bb06b
to
82f6fa5
Compare
82f6fa5
to
1f2bd85
Compare
@dshemetov GitHub suggested you for review, if you have time. Experimented with some checks to try to error on things that might be expected to work but have details; see tests for some examples. (Probably could map to "uncompactified" logic while remaining scalable & without a bunch of extra logic if we had a duckdb backend.) |
} | ||
e <- parent.env(e) | ||
} | ||
TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it will error if you use it inside a function that has "version" or "time_value" defined in its environment? I'm reading this as traversing up the environment chain and stopping short of the globalenv(), which would be most likely to have variables like that defined, but intermediate scopes might still have false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I tested it locally and it seems fine. I guess this works because we hit the data mask environment first and break out before we hit the user's function environment? Seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the data mask environment chain in current dplyr&rlang looks something like
rlang wrapper env -> data bindings env (/ env chain input to as_data_mask env) -> quosure env (chain)
where:
- rlang wrapper env holds the real data pronoun objects, a
~
override I don't quite understand, and some other internals - data bindings env is typically just a single env holding (group's) column bindings; in other contexts, it could be an env chain we fed into as_data_mask (with its "top" ancestor reassigned to point at:)
- the quosure env
We should stop at the data bindings env and reassign things there.
But I did find an issue along those lines
epidatasets::case_death_rate_archive %>% {.$DT <- copy(.$DT)[, e := 1]; .} %>% filter(e < 2)
#> Error in `filter()`:
#> ℹ In argument: `e < 2`.
#> Caused by error in `e < 2`:
#> ! comparison (<) is possible only for atomic and list types
#> Run `rlang::last_trace()` to see where the error occurred.
because I'm leaving around an e
in the rlang wrapper env.
Also, I fell for a classic lazy eval + env issue
epidatasets::case_death_rate_archive %>% filter(case_rate_7d_av < 2)
#> Error in `filter()`:
#> ℹ In argument: `case_rate_7d_av < 2`.
#> Caused by error:
#> ! Using `death_rate_7d_av` in `filter.epi_archive` may produce unexpected results.
#> → See `?filter.epi_archive` details for how to proceed.
#> Run `rlang::last_trace()` to see where the error occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and found some others. Should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this looks good to me. Hard to anticipate all future issues with the archive metadata after the filtering, but I think you made the right calls in choosing the defaults (only reinferring the time_type in the day case and inheriting otherwise) and explaining the counter-intuitive aspects.
|
@dshemetov along with the lazy eval gotcha fixes, I've patched some (unrelated, but check-triggering due to upstream updates) issues regarding Date typeof strictness in the package code and in the testing code. Maybe I'll just merge for now, but if the latter part raises flags in your mind, please note in #662 and up its priority & timeline. |
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
filter
forepi_archive
#231TODO
value >= 10
will remove revisions of things >= 10 down to < 10, when the intent is probably to treat the epikeytime as missing in corresponding versions (e.g., putting an NA there).select.epi_archive
and use it in the example to focus on the 7dav signal -> weekly res... not sure if taking Saturdays of Gaussian linear smoothed signal makes sense necessarily, though also not sure if we should have these lined up in the same data structure at all.