Skip to content

Commit dd5d710

Browse files
authored
Merge pull request #551 from cmu-delphi/as_of_covid_hosp
Add as_of support to covid_hosp
2 parents 422b048 + f1d5ccd commit dd5d710

File tree

11 files changed

+53
-10
lines changed

11 files changed

+53
-10
lines changed

docs/api/covid_hosp.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ General topics not specific to any particular data source are discussed in the
2424
## Metadata
2525

2626
This data source provides various measures of COVID-19 burden on patients and healthcare in the US.
27-
- Data source: [US Department of Health & Human Services](https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-state-timeseries) (HHS)
27+
- Data source: US Department of Health & Human Services (HHS) [COVID-19 Reported Patient Impact and
28+
Hospital Capacity by State Timeseries](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh)
29+
and [COVID-19 Reported Patient Impact and Hospital Capacity by State](https://healthdata.gov/dataset/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/6xf2-c3ie)
2830
- Temporal Resolution: Daily, starting 2020-01-01
2931
- Spatial Resolution: US States plus DC, PR, and VI
3032
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)

docs/api/covid_hosp_facility.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ General topics not specific to any particular data source are discussed in the
2020
## Metadata
2121

2222
This data source provides various measures of COVID-19 burden on patients and healthcare in the US.
23-
- Data source: [US Department of Health & Human Services](https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility) (HHS)
23+
- Data source: [US Department of Health & Human Services](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u) (HHS)
2424
- Geographic resolution: healthcare facility (address, city, zip, fips)
2525
- Temporal resolution: weekly (Friday -- Thursday)
2626
- First week: 2020-07-31

docs/api/covid_hosp_facility_lookup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ General topics not specific to any particular data source are discussed in the
2424
## Metadata
2525

2626
This data source provides metadata about healthcare facilities in the US.
27-
- Data source: [US Department of Health & Human Services](https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility) (HHS)
27+
- Data source: [US Department of Health & Human Services](https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u) (HHS)
2828
- Total number of facilities: 4922
2929
- Open access via [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1.0/)
3030

docs/api/covidcast-signals/hospital-admissions.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ hospital admissions, provided to us by health system partners. We use this
2222
inpatient data to estimate the percentage of new hospital admissions with a
2323
COVID-associated diagnosis code in a given location, on a given day.
2424

25+
See also our [Health & Human Services](hhs.md) data source for official COVID
26+
hospitalization reporting from the Department of Health & Human Services.
27+
2528
| Signal | Description |
2629
| --- | --- |
2730
| `smoothed_covid19_from_claims` | Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on claims data from health system partners, smoothed in time using a Gaussian linear smoother <br/> **Earliest date available:** 2020-02-01 |

docs/api/covidcast-signals/safegraph.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,4 @@ COVIDcast API.
8484
SafeGraph's Social Distancing Metrics and Weekly Patterns are based on mobile devices that are members of SafeGraph panels, which is not necessarily the same thing as measuring the general public. These counts do not represent absolute counts, and only count visits by members of the panel in that region. This can result in several biases:
8585

8686
* **Geographic bias.** If some regions have a greater density of SafeGraph panel members as a percentage of the population than other regions, comparisons of metrics between regions may be biased. Regions with more SafeGraph panel members will appear to have more visits counted, even if the rate of visits in the general population is the same.
87-
* **Demographic bias.** SafeGraph panels may not be representative of the local population as a whole. For example, [some research suggests](https://arxiv.org/abs/2011.07194) that "older and non-white voters are less likely to be captured by mobility data", so this data will not accurately reflect behavior in those populations. Since population demographics vary across the United States, this can also contribute to geographic biases.
87+
* **Demographic bias.** SafeGraph panels may not be representative of the local population as a whole. For example, [some research suggests](https://doi.org/10.1145/3442188.3445881) that "older and non-white voters are less likely to be captured by mobility data", so this data will not accurately reflect behavior in those populations. Since population demographics vary across the United States, this can also contribute to geographic biases.

integrations/acquisition/covid_hosp/state_timeseries/test_scenarios.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def setUp(self):
4040
cur.execute('truncate table covid_hosp_state_timeseries')
4141
cur.execute('truncate table covid_hosp_meta')
4242

43-
@freeze_time("2021-03-16")
43+
@freeze_time("2021-03-17")
4444
def test_acquire_dataset(self):
4545
"""Acquire a new dataset."""
4646

@@ -89,3 +89,30 @@ def test_acquire_dataset(self):
8989
response = Epidata.covid_hosp('WY', Epidata.range(20200101, 20210101))
9090
self.assertEqual(response['result'], 1)
9191
self.assertEqual(len(response['epidata']), 1)
92+
93+
# acquire new data into local database
94+
with self.subTest(name='first acquisition'):
95+
# acquire new data with 3/16 issue date
96+
mock_network.fetch_metadata.return_value = \
97+
self.test_utils.load_sample_metadata("metadata2.csv")
98+
mock_network.fetch_dataset.return_value = \
99+
self.test_utils.load_sample_dataset("dataset2.csv")
100+
acquired = Update.run(network=mock_network)
101+
self.assertTrue(acquired)
102+
103+
with self.subTest(name='as_of checks'):
104+
105+
response = Epidata.covid_hosp('WY', Epidata.range(20200101, 20210101))
106+
self.assertEqual(len(response['epidata']), 2)
107+
row = response['epidata'][1]
108+
self.assertEqual(row['date'], 20200827)
109+
110+
# previous data should have 3/15 issue date
111+
response = Epidata.covid_hosp('WY', Epidata.range(20200101, 20210101), as_of=20210315)
112+
self.assertEqual(len(response['epidata']), 1)
113+
row = response['epidata'][0]
114+
self.assertEqual(row['date'], 20200826)
115+
116+
# no data before 3/15
117+
response = Epidata.covid_hosp('WY', Epidata.range(20200101, 20210101), as_of=20210314)
118+
self.assertEqual(response['result'], -2)

src/acquisition/covid_hosp/common/test_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ def __init__(self, abs_path_to_caller):
5050
path_to_repo / UnitTestUtils.PATH_TO_TESTDATA / dataset_name
5151
).resolve()
5252

53-
def load_sample_metadata(self):
54-
df = pandas.read_csv(self.data_dir / 'metadata.csv', dtype=str)
53+
def load_sample_metadata(self, metadata_name='metadata.csv'):
54+
df = pandas.read_csv(self.data_dir / metadata_name, dtype=str)
5555
df["Update Date"] = pandas.to_datetime(df["Update Date"])
5656
df.sort_values("Update Date", inplace=True)
5757
df.set_index("Update Date", inplace=True)

src/client/delphi_epidata.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -612,7 +612,7 @@ def covidcast_meta():
612612

613613
# Fetch COVID hospitalization data
614614
@staticmethod
615-
def covid_hosp(states, dates, issues=None):
615+
def covid_hosp(states, dates, issues=None, as_of=None):
616616
"""Fetch COVID hospitalization data."""
617617
# Check parameters
618618
if states is None or dates is None:
@@ -625,6 +625,8 @@ def covid_hosp(states, dates, issues=None):
625625
}
626626
if issues is not None:
627627
params['issues'] = Epidata._list(issues)
628+
if as_of is not None:
629+
params['as_of'] = as_of
628630
# Make the API call
629631
return Epidata._request(params)
630632

src/server/endpoints/covid_hosp_state_timeseries.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from flask import Blueprint
22

33
from .._query import execute_query, QueryBuilder
4-
from .._validate import extract_integers, extract_strings, require_all
4+
from .._validate import extract_integers, extract_strings, extract_date, require_all
55

66
# first argument is the endpoint name
77
bp = Blueprint("covid_hosp_state_timeseries", __name__)
@@ -14,6 +14,7 @@ def handle():
1414
states = extract_strings("states")
1515
dates = extract_integers("dates")
1616
issues = extract_integers("issues")
17+
as_of = extract_date("as_of")
1718

1819
# build query
1920
q = QueryBuilder("covid_hosp_state_timeseries", "c")
@@ -94,7 +95,11 @@ def handle():
9495
if issues is not None:
9596
q.where_integers("issue", issues)
9697
# final query using specific issues
97-
query = f"WITH c as (SELECT {q.fields_clause}, ROW_NUMBER() OVER (PARTITION BY date, state, issue ORDER BY record_type) row FROM {q.table} WHERE {q.conditions_clause}) SELECT {q.fields_clause} FROM {q.alias} where row = 1 ORDER BY {q.order_clause}"
98+
query = f"WITH c as (SELECT {q.fields_clause}, ROW_NUMBER() OVER (PARTITION BY date, state, issue ORDER BY record_type) row FROM {q.table} WHERE {q.conditions_clause}) SELECT {q.fields_clause} FROM {q.alias} WHERE row = 1 ORDER BY {q.order_clause}"
99+
elif as_of is not None:
100+
sub_condition_asof = "(issue <= :as_of)"
101+
q.params["as_of"] = as_of
102+
query = f"WITH c as (SELECT {q.fields_clause}, ROW_NUMBER() OVER (PARTITION BY date, state, issue ORDER BY record_type) row FROM {q.table} WHERE {q.conditions_clause} AND {sub_condition_asof}) SELECT {q.fields_clause} FROM {q.alias} WHERE row = 1 ORDER BY {q.order_clause}"
98103
else:
99104
# final query using most recent issues
100105
subquery = f"(SELECT max(`issue`) `max_issue`, `date`, `state` FROM {q.table} WHERE {q.conditions_clause} GROUP BY `date`, `state`) x"
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
state,date,critical_staffing_shortage_today_yes,critical_staffing_shortage_today_no,critical_staffing_shortage_today_not_reported,critical_staffing_shortage_anticipated_within_week_yes,critical_staffing_shortage_anticipated_within_week_no,critical_staffing_shortage_anticipated_within_week_not_reported,hospital_onset_covid,hospital_onset_covid_coverage,inpatient_beds,inpatient_beds_coverage,inpatient_beds_used,inpatient_beds_used_coverage,inpatient_beds_used_covid,inpatient_beds_used_covid_coverage,previous_day_admission_adult_covid_confirmed,previous_day_admission_adult_covid_confirmed_coverage,previous_day_admission_adult_covid_suspected,previous_day_admission_adult_covid_suspected_coverage,previous_day_admission_pediatric_covid_confirmed,previous_day_admission_pediatric_covid_confirmed_coverage,previous_day_admission_pediatric_covid_suspected,previous_day_admission_pediatric_covid_suspected_coverage,staffed_adult_icu_bed_occupancy,staffed_adult_icu_bed_occupancy_coverage,staffed_icu_adult_patients_confirmed_and_suspected_covid,staffed_icu_adult_patients_confirmed_and_suspected_covid_coverage,staffed_icu_adult_patients_confirmed_covid,staffed_icu_adult_patients_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,total_adult_patients_hospitalized_confirmed_covid,total_adult_patients_hospitalized_confirmed_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,total_pediatric_patients_hospitalized_confirmed_covid,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_staffed_adult_icu_beds,total_staffed_adult_icu_beds_coverage,inpatient_beds_utilization,inpatient_beds_utilization_coverage,inpatient_beds_utilization_numerator,inpatient_beds_utilization_denominator,percent_of_inpatients_with_covid,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid_numerator,percent_of_inpatients_with_covid_denominator,inpatient_bed_covid_utilization,inpatient_bed_covid_utilization_coverage,inpatient_bed_covid_utilization_numerator,inpatient_bed_covid_utilization_denominator,adult_icu_bed_covid_utilization,adult_icu_bed_covid_utilization_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization_denominator,adult_icu_bed_utilization,adult_icu_bed_utilization_coverage,adult_icu_bed_utilization_numerator,adult_icu_bed_utilization_denominator
2+
WY,2020/08/27,2,,4,2,19,7,0,26,1464,28,629,28,17,26,2,28,13,26,0,21,0,22,49,28,10,26,7,28,17,26,14,28,0,26,0,26,114,28,0.4296448087431694,28,629,1464,0.027597402597402596,26,17,616,0.011946591707659873,26,17,1423,0.09345794392523364,26,10,107,0.4298245614035088,28,49,114

testdata/acquisition/covid_hosp/state_timeseries/metadata2.csv

Lines changed: 2 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)