Skip to content

Commit b338a09

Browse files
authored
Merge pull request #1991 from cmu-delphi/release/indicators_v0.3.55_utils_v0.3.24
Release covidcast-indicators 0.3.55
2 parents 8d44629 + 96d707a commit b338a09

File tree

113 files changed

+5086
-685
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+5086
-685
lines changed

.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.54
2+
current_version = 0.3.55
33
commit = True
44
message = chore: bump covidcast-indicators to {new_version}
55
tag = False

.git-blame-ignore-revs

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Format geomap.py
2+
d4b056e7a4c11982324e9224c9f9f6fd5d5ec65c
3+
# Format test_geomap.py
4+
79072dcdec3faca9aaeeea65de83f7fa5c00d53f
5+
# Sort setup.py dependencies
6+
6912077acba97e835aff7d0cd3d64309a1a9241d

.github/workflows/backfill-corr-ci.yml

+11-31
Original file line numberDiff line numberDiff line change
@@ -10,57 +10,37 @@ name: R backfill corrections
1010

1111
on:
1212
push:
13-
branches: [ main, prod ]
13+
branches: [main, prod]
1414
pull_request:
15-
types: [ opened, synchronize, reopened, ready_for_review ]
16-
branches: [ main, prod ]
15+
types: [opened, synchronize, reopened, ready_for_review]
16+
branches: [main, prod]
1717

1818
jobs:
1919
build:
20-
runs-on: ubuntu-20.04
20+
runs-on: ubuntu-latest
2121
if: github.event.pull_request.draft == false
22-
strategy:
23-
matrix:
24-
r-version: [4.2.1]
2522
defaults:
2623
run:
2724
working-directory: backfill_corrections/delphiBackfillCorrection
2825

2926
steps:
30-
- uses: actions/checkout@v2
31-
- name: Set up R ${{ matrix.r-version }}
27+
- uses: actions/checkout@v4
28+
29+
- name: Set up R 4.2
3230
uses: r-lib/actions/setup-r@v2
3331
with:
34-
r-version: ${{ matrix.r-version }}
3532
use-public-rspm: true
36-
- name: Install linux dependencies
37-
run: |
38-
sudo apt-get install \
39-
libcurl4-openssl-dev \
40-
libgdal-dev \
41-
libudunits2-dev \
42-
libglpk-dev \
43-
libharfbuzz-dev \
44-
libfribidi-dev
45-
- name: Get date
46-
id: get-date
47-
run: |
48-
echo "::set-output name=date::$(/bin/date -u "+%Y%m%d")"
49-
- name: Cache R packages
50-
uses: actions/cache@v2
51-
with:
52-
path: ${{ env.R_LIBS_USER }}
53-
key: ${{ runner.os }}-r-backfillcorr-${{ steps.get-date.outputs.date }}
54-
restore-keys: |
55-
${{ runner.os }}-r-backfillcorr-
33+
r-version: 4.2
34+
5635
- name: Install and cache dependencies
5736
env:
5837
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
5938
uses: r-lib/actions/setup-r-dependencies@v2
6039
with:
6140
extra-packages: any::rcmdcheck
6241
working-directory: backfill_corrections/delphiBackfillCorrection
63-
upgrade: 'TRUE'
42+
upgrade: "TRUE"
43+
6444
- name: Check package
6545
uses: r-lib/actions/check-r-package@v2
6646
with:

.github/workflows/build-container-images.yml

+3-2
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@ name: Build indicator container images and upload to registry
22

33
on:
44
push:
5-
branches: [ main, prod ]
5+
branches: [main, prod]
6+
workflow_dispatch:
67

78
jobs:
89
build:
910
runs-on: ubuntu-latest
1011
strategy:
1112
matrix:
12-
packages: [ backfill_corrections ]
13+
packages: [backfill_corrections]
1314
steps:
1415
- name: Checkout code
1516
uses: actions/checkout@v2

.github/workflows/python-ci.yml

+35-16
Original file line numberDiff line numberDiff line change
@@ -16,28 +16,42 @@ jobs:
1616
if: github.event.pull_request.draft == false
1717
strategy:
1818
matrix:
19-
packages:
20-
[
21-
_delphi_utils_python,
22-
changehc,
23-
claims_hosp,
24-
doctor_visits,
25-
google_symptoms,
26-
hhs_hosp,
27-
nchs_mortality,
28-
nwss_wastewater,
29-
quidel_covidtest,
30-
sir_complainsalot,
31-
]
19+
include:
20+
- package: "_delphi_utils_python"
21+
dir: "delphi_utils"
22+
- package: "changehc"
23+
dir: "delphi_changehc"
24+
- package: "claims_hosp"
25+
dir: "delphi_claims_hosp"
26+
- package: "doctor_visits"
27+
dir: "delphi_doctor_visits"
28+
- package: "google_symptoms"
29+
dir: "delphi_google_symptoms"
30+
- package: "hhs_hosp"
31+
dir: "delphi_hhs"
32+
- package: "nchs_mortality"
33+
dir: "delphi_nchs_mortality"
34+
- package: "nssp"
35+
dir: "delphi_nssp"
36+
- package: "nwss_wastewater"
37+
dir: "delphi_nwss"
38+
- package: "quidel_covidtest"
39+
dir: "delphi_quidel_covidtest"
40+
- package: "sir_complainsalot"
41+
dir: "delphi_sir_complainsalot"
3242
defaults:
3343
run:
34-
working-directory: ${{ matrix.packages }}
44+
working-directory: ${{ matrix.package }}
3545
steps:
36-
- uses: actions/checkout@v2
46+
- uses: actions/checkout@v4
47+
with:
48+
fetch-depth: 0
3749
- name: Set up Python 3.8
38-
uses: actions/setup-python@v2
50+
uses: actions/setup-python@v5
3951
with:
4052
python-version: 3.8
53+
cache: "pip"
54+
cache-dependency-path: "setup.py"
4155
- name: Install testing dependencies
4256
run: |
4357
python -m pip install --upgrade pip
@@ -51,3 +65,8 @@ jobs:
5165
- name: Test
5266
run: |
5367
make test
68+
- uses: akaihola/[email protected]
69+
with:
70+
options: "--check --diff --isort --color"
71+
src: "${{ matrix.package }}/${{ matrix.dir }}"
72+
version: "~=2.1.1"

Jenkinsfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
1111
*/
1212

13-
def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits", "nwss_wastewater"]
13+
def indicator_list = ["backfill_corrections", "changehc", "claims_hosp", "google_symptoms", "hhs_hosp", "nchs_mortality", "quidel_covidtest", "sir_complainsalot", "doctor_visits", "nwss_wastewater", "nssp"]
1414
def build_package_main = [:]
1515
def build_package_prod = [:]
1616
def deploy_staging = [:]

README.md

+32-9
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
In early April 2020, Delphi developed a uniform data schema for [a new Epidata endpoint focused on COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Our intent was to provide signals that would track in real-time and in fine geographic granularity all facets of the COVID-19 pandemic, aiding both nowcasting and forecasting. Delphi's long history in tracking and forecasting influenza made us uniquely situated to provide access to data streams not available anywhere else, including medical claims data, electronic medical records, lab test records, massive public surveys, and internet search trends. We also process commonly-used publicly-available data sources, both for user convenience and to provide data versioning for sources that do not track revisions themselves.
66

7-
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).
7+
Each data stream arrives in a different format using a different delivery technique, be it sftp, an access-controlled API, or an email attachment. The purpose of each pipeline in this repository is to fetch the raw source data, extract informative aggregate signals, and output those signals---which we call **COVID-19 indicators**---in a common format for upload to the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html).
88

99
For client access to the API, along with a variety of other utilities, see our [R](https://cmu-delphi.github.io/covidcast/covidcastR/) and [Python](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) packages.
1010

@@ -13,18 +13,19 @@ For interactive visualizations (of a subset of the available indicators), see ou
1313
## Organization
1414

1515
Utilities:
16-
* `_delphi_utils_python` - common behaviors
17-
* `_template_python` & `_template_r` - starting points for new data sources
18-
* `ansible` & `jenkins` - automated testing and deployment
19-
* `sir_complainsalot` - a Slack bot to check for missing data
16+
17+
- `_delphi_utils_python` - common behaviors
18+
- `_template_python` & `_template_r` - starting points for new data sources
19+
- `ansible` & `jenkins` - automated testing and deployment
20+
- `sir_complainsalot` - a Slack bot to check for missing data
2021

2122
Indicator pipelines: all remaining directories.
2223

23-
Each indicator pipeline includes its own documentation.
24+
Each indicator pipeline includes its own documentation.
2425

25-
* Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
26-
* Consult REVIEW.md for the checklist to use for code reviews.
27-
* Consult DETAILS.md (if present) for implementation details, including handling of corner cases.
26+
- Consult README.md for directions to install, lint, test, and run the pipeline for that indicator.
27+
- Consult REVIEW.md for the checklist to use for code reviews.
28+
- Consult DETAILS.md (if present) for implementation details, including handling of corner cases.
2829

2930
## Development
3031

@@ -35,6 +36,28 @@ Each indicator pipeline includes its own documentation.
3536
3. Add new commits to your branch in response to feedback.
3637
4. When approved, tag an admin to merge the PR. Let them know if this change should be released immediately, at a set future date, or if it can just go along for the ride whenever the next release happens.
3738

39+
### Linting and Formatting
40+
41+
Each indicator has a `make lint` command to check for linting errors and a `make
42+
format` command to incrementally format your code (using
43+
[darker](https://github.com/akaihola/darker)). These are both automated with a
44+
[Github Action](.github/workflows/python-ci.yml).
45+
46+
If you get the error `ERROR:darker.git:fatal: Not a valid commit name <hash>`,
47+
then it's likely because your local main branch is not up to date; either you
48+
need to rebase or merge. Note that `darker` reads from `pyproject.toml` for
49+
default settings.
50+
51+
If the lines you change are in a file that uses 2 space indentation, `darker`
52+
will indent the lines around your changes and not the rest, which will likely
53+
break the code; in that case, you should probably just pass the whole file
54+
through black. You can do that with the following command (using the same
55+
virtual environment as above):
56+
57+
```sh
58+
env/bin/black <file>
59+
```
60+
3861
## Release Process
3962

4063
The release process consists of multiple steps which can all be done via the GitHub website:

_delphi_utils_python/.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.23
2+
current_version = 0.3.24
33
commit = True
44
message = chore: bump delphi_utils to {new_version}
55
tag = False

_delphi_utils_python/.pylintrc

-22
This file was deleted.

_delphi_utils_python/Makefile

+4-1
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,12 @@ install-ci: venv
1414
pip install .
1515

1616
lint:
17-
. env/bin/activate; pylint delphi_utils
17+
. env/bin/activate; pylint delphi_utils --rcfile=../pyproject.toml
1818
. env/bin/activate; pydocstyle delphi_utils
1919

20+
format:
21+
. env/bin/activate; darker delphi_utils
22+
2023
test:
2124
. env/bin/activate ;\
2225
(cd tests && ../env/bin/pytest --cov=delphi_utils --cov-report=term-missing)
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Geocoding data processing pipeline
1+
# Geocoding Data Processing
22

33
Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov
44

@@ -7,42 +7,37 @@ Authors: Jingjing Tang, James Sharpnack, Dmitry Shemetov
77
Requires the following source files below.
88

99
Run the following to build the crosswalk tables in `covidcast-indicators/_delph_utils_python/delph_utils/data`
10-
```
10+
11+
```sh
1112
$ python geo_data_proc.py
1213
```
1314

14-
You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb
15+
Find data consistency checks in `./source-file-sanity-check.ipynb`.
1516

1617
## Geo Codes
1718

1819
We support the following geocodes.
1920

20-
- The ZIP code and the FIPS code are the most granular geocodes we support.
21-
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
22-
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
23-
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html).
24-
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
21+
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
22+
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
23+
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html). We rserve 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
2524
- State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more.
2625
- The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/).
27-
FIPS codes depart in some special cases, so we produce manual changes listed below.
2826

29-
## Source files
27+
## Source Files
3028

3129
The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed.
3230

3331
- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. As of 4 February 2022, this source did not include population information for 24 ZIPs that appear in our indicators. We have added those values manually using information available from the [zipdatamaps website](www.zipdatamaps.com).
3432
- ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data).
3533
- FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html).
36-
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here.
34+
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3).
3735

38-
39-
## Derived files
36+
## Derived Files
4037

4138
The rest of the crosswalk tables are derived from the mappings above. We provide crosswalk functions from granular to coarser codes, but not the other way around. This is because there is no information gained when crosswalking from coarse to granular.
4239

43-
44-
45-
## Deprecated source files
40+
## Deprecated Source Files
4641

4742
- ZIP to FIPS to HRR to states: `02_20_uszips.csv` comes from a version of the table [here](https://simplemaps.com/data/us-zips) modified by Jingjing to include population weights.
4843
- The `02_20_uszips.csv` file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from [here](https://simplemaps.com/data/us-zips). This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with [zip-codes](https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from [here](https://atlasdata.dartmouth.edu/static/supp_research_data).
@@ -51,7 +46,3 @@ The rest of the crosswalk tables are derived from the mappings above. We provide
5146
- CBSA -> FIPS crosswalk from [here](https://data.nber.org/data/cbsa-fips-county-crosswalk.html) (the file is `cbsatocountycrosswalk.csv`).
5247
- MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).
5348
- MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm)
54-
55-
## Notes
56-
57-
- The NAs in the coding currently zero-fills.

0 commit comments

Comments
 (0)