Skip to content

Commit 0d15c83

Browse files
author
Natalia Lombardi
committed
emphasizing when signals are comparable and when they are not
1 parent 662b2f2 commit 0d15c83

File tree

1 file changed

+6
-8
lines changed

1 file changed

+6
-8
lines changed

docs/api/covidcast-signals/google-symptoms.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ grand_parent: COVIDcast Epidata API
2020
This data source is based on the [COVID-19 Search Trends symptoms
2121
dataset](http://goo.gle/covid19symptomdataset). Using
2222
this search data, we estimate the volume of searches mapped to symptom sets related
23-
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals but NOT across geographic regions**. Larger numbers represent increased relative popularity of symptom-related searches.
23+
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches.
2424

2525
#### Symptom sets
2626

@@ -72,16 +72,14 @@ Each signal is the average of the
7272
anosmia, ageusia, and dysgeusia related searches divided by 3, because the data volume for each symptom is calculated based on search queries. A single search query can be mapped to more than one symptom. Currently, Google does not provide _intersection/union_
7373
data. Users should be careful when considering such signals.
7474

75-
For each symptom set: when search trends for all symptoms are missing, the signal is reported as missing. When search trends are available for at least one of the symptoms, we fill the missing trends for other symptoms with 0 and compute the average. We use this approach because the missing observations in the Google Symptoms search trends dataset do not occur randomly; they represent low popularity and are censored for quality and/or privacy reasons. The same approach is used for smoothed signals. A 7 day moving average is used, and missing raw signals are filled with 0 as long as there is at least one day available within the 7 day window.
75+
For each symptom set: when search trends for all symptoms are missing, the signal is reported as missing. When search trends are available for at least one of the symptoms, we fill the missing trends for other symptoms with 0 and compute the average. We use this approach because the missing observations in the Google Symptoms search trends dataset do not occur randomly; they represent low popularity and are censored for quality and/or privacy reasons. The same approach is used for smoothed signals. A 7 day moving average is used, and missing raw signals are filled with 0 as long as there is at least one day available within the 7 day window.
7676

7777

7878

7979
## Geographical Aggregation
80-
The state-level and county-level `raw_search` signals for specific symptoms such
81-
as _anosmia_ and _ageusia_ are taken directly from the [COVID-19 Search Trends
80+
The state-level and county-level `raw_search` signals for each symptoms set are the average of its individual symptoms search trends, taken directly from the [COVID-19 Search Trends
8281
symptoms
83-
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset)
84-
without changes.
82+
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset).
8583

8684
We aggregate county and state data to other geographic levels using
8785
population-weighted averaging.
@@ -115,8 +113,8 @@ quality of results.
115113

116114
Google normalizes and scales time series values to determine the relative
117115
popularity of symptoms in searches within each geographical region individually.
118-
This means that the resulting values of symptom popularity are **NOT**
119-
comparable across geographic regions.
116+
This means that the resulting values of symptom set popularity are **NOT**
117+
comparable across geographic regions, while the values of different symptom sets are comparable within the same location.
120118

121119
More details about the limitations of this dataset are available in [Google's Search
122120
Trends symptoms dataset documentation](https://storage.googleapis.com/gcp-public-data-symptom-search/COVID-19%20Search%20Trends%20symptoms%20dataset%20documentation%20.pdf).

0 commit comments

Comments
 (0)