Skip to content

Conversation

@nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Aug 23, 2023

Summary:

Pull flusurv data from new CDC API endpoint. Ingest previously unlabelled age groupings. Ingest new race/ethnicity and sex breakdowns. Test flusurv.py functions. Add more comments, messaging, and assertions.

Closes #1247
Closes #242

Note:

  • Some of our CDC contacts are able to provide us with past versions of data from the API. We will want to patch those in. Many of the "new" strata are actually available back to 2009. We'll probably want to patch those in as well.
  • This also requires updates to the database table, the API server, and the API docs that will be made separately.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

- rename input arg to `update` to avoid reassignment later
- comment and reuse args_insert
- spelling
- comment magic constant used in output format
- rename location-network/catchmentid map
Previously, age strata were numbered sequentially which allowed us to
store rate values by position in a list. With the introduction of the
new strata, this system is not robust enough to track all the different
groups (e.g. ageids are no longer sequential and there are now race and
sex groupings with separate numbering systems).
@nmdefries nmdefries force-pushed the ndefries/flusurv-new-endpoint branch from 734dca9 to f8a6706 Compare September 15, 2023 15:50
@nmdefries nmdefries marked this pull request as ready for review September 15, 2023 16:25
@nmdefries nmdefries removed the request for review from brookslogan September 15, 2023 21:51
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! i really appreciate the thorough commenting!

youll wanna pull in the changes from the dev branch, PR #1241 added tests for the flusurv endpoint.

@nmdefries
Copy link
Contributor Author

I duplicated some of the work in #1287 in c88c6fc. (#1287 needs to be re-merged into this PR anyway, see comment.)

@nmdefries
Copy link
Contributor Author

@sonarqubecloud
Copy link

@nmdefries
Copy link
Contributor Author

Things to check before putting into prod:

  • DB migration (will need to test in staging)
  • interacting correctly with DB
  • naming new columns correctly and inserting them in the right spots
    • avoid mixed column meanings, e.g. don't change meanings of col names
  • clients work

@brookslogan
Copy link
Contributor

brookslogan commented Oct 9, 2025

Things to check before putting into prod:

  • naming new columns correctly and inserting them in the right spots
    • avoid mixed column meanings, e.g. don't change meanings of col names

I did a quick skim of the DB migration script and some of the acquisition code dealing with column names and ids. The former passed a sanity check and didn't seem like a risky operation anyway. The latter seems to have been written with a good deal of care (e.g., the changes to the overall age group column id and the new age group column ids). I don't have an environment to mock stuff but if you need a set of eyes to sanity-check the acquisition result after deployment, feel free to ping me and I can double-check API vs. upstream values.

@nmdefries
Copy link
Contributor Author

Okay, verified that this works locally in a fresh environment, so it is ready for another review @melange396 .

@brookslogan , thanks for your offer for additional checks! I am working on cleaning up our test environment instructions to make this repo easier to run locally.

@nmdefries
Copy link
Contributor Author

Source data is here.

@nmdefries
Copy link
Contributor Author

nmdefries commented Oct 24, 2025

  • How long does this take to run for the last ~5 years for patching purposes? May want to go back farther, so can the history for demographic breakdowns
    • how long does that take (22 years)
    • what happens if try to insert new values for an existing obs/key? apparently keep newer value.
  • alert/warn if new demographic breakdowns are added [did this already?]
  • future TODO: generate SQL, other lists of output columns from constants.py so don't have so much duplication

@nmdefries nmdefries removed the request for review from aysim319 October 24, 2025 21:24
@nmdefries
Copy link
Contributor Author

alert/warn if new demographic breakdowns are added

We warn about new demographic breakdowns here. Expected breakdowns are defined here.

@nmdefries
Copy link
Contributor Author

How long does this take to run for the last ~22 years for patching purposes? Want to capture the history for demographic breakdowns

On my (old) computer this takes 54 s, so this shouldn't be a problem.

This source doesn't have version history, so when patching all the new data will be added under "today"'s issue date. I believe we have a connection at the source who can share their internal versioned data with us.

what happens if try to insert new values for an existing obs/key?

Based on the code, we apparently keep the newer value. However, the unique key is issue + epiweek + location. We aren't creating old issues, so all of the new patched data won't change the values for old issues.

@nmdefries
Copy link
Contributor Author

nmdefries commented Oct 27, 2025

I wasnt paying attention and accidentally merged #1287 into this PR. I then reverted #1287 in #1426. This is a reminder that #1287 will eventually need to be re-merged into here (which is now probably best done by reverting the revert in #1426).

#1426 can't be reverted for some reason. I merged the ndefries/flusurv-new-columns branch in directly in f544328

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flusurv data is stale flusurv acquisition is broken

5 participants