Skip to content

Commit 3b54250

Browse files
authored
Merge pull request #713 from cmu-delphi/release/delphi-epidata-0.2.14
Release Delphi Epidata 0.2.14
2 parents 395175b + 0d90206 commit 3b54250

File tree

20 files changed

+469
-44
lines changed

20 files changed

+469
-44
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.2.13
2+
current_version = 0.2.14
33
commit = False
44
tag = False
55

docs/api/covidcast-signals/fb-survey.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -643,6 +643,16 @@ declines, some indicators will become unavailable once they no longer meet the
643643
This affects some signals, such as those based on a subset of responses, more
644644
than others, with finer geographic resolutions becoming unavailable first.
645645

646+
### Target Region
647+
648+
Facebook only invites users to take the survey if they appear, based on
649+
attributes in their Facebook profiles, to reside in the 50 states or
650+
Washington, DC. Puerto Rico is sampled separately as part of the
651+
[international version of the survey](https://covidmap.umd.edu/). If Facebook
652+
believes a user qualifies for the survey, but the user then replies that they
653+
live in Puerto Rico or another US territory, we do not include their response
654+
in the aggregations.
655+
646656

647657
## Survey Weighting
648658

docs/symptom-survey/codebook.csv

Lines changed: 15 additions & 15 deletions
Large diffs are not rendered by default.

docs/symptom-survey/contingency-tables.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,14 @@ only use a partial week or month of data.
7070

7171
At the moment, only nation-wide and state groupings are available.
7272

73+
Facebook only invites users to take the survey if they appear, based on
74+
attributes in their Facebook profiles, to reside in the 50 states or
75+
Washington, DC. Puerto Rico is sampled separately as part of the
76+
[international version of the survey](https://covidmap.umd.edu/). If Facebook
77+
believes a user qualifies for the survey, but the user then replies that they
78+
live in Puerto Rico or another US territory, we do not include their response
79+
in the aggregations.
80+
7381
### Privacy
7482

7583
The aggregates are filtered to only include estimates for a particular group

docs/symptom-survey/data-access.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ University Institutional Review Board with IRB ID STUDY2020_00000162.
3434

3535
Some important notes about obtaining access to the individual survey responses:
3636

37+
* You should be familiar with the [survey's limitations](limitations.md) and
38+
ensure the survey data is suitable for your research goals.
3739
* Your research purpose must be consistent with the consent language used in
3840
[Wave 1 of the survey](coding.md#wave-1), which states the responses may be
3941
used to create "a better public health understanding of where the coronavirus

docs/symptom-survey/index.md

Lines changed: 22 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@ social distancing), mental health, and economic and health impacts they have
1313
experienced as a result of the pandemic. A high-level overview of the survey is
1414
posted [on the COVIDcast website](https://delphi.cmu.edu/covidcast/surveys/).
1515

16-
Geographically aggregated data from this survey is publicly available through
17-
the [COVIDcast API](../api/covidcast.md) as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md).
18-
Demographic breakdowns of survey data are publicly available as
19-
[downloadable contingency tables](contingency-tables.md).
16+
The [survey results dashboard](https://delphi.cmu.edu/covidcast/survey-results/)
17+
provides a high-level summary of survey results. Geographically aggregated data
18+
from this survey is publicly available through the [COVIDcast API](../api/covidcast.md)
19+
as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md). Demographic breakdowns of survey
20+
data are publicly available as [downloadable contingency tables](contingency-tables.md).
2021

2122
This documentation describes the survey items, data coding, data distribution,
2223
and the survey weights computed by Facebook. It also documents the individual
@@ -30,20 +31,28 @@ If you have questions about the survey or getting access to data, contact us at
3031
## Credits
3132

3233
The COVID-19 Trends and Impact Survey (CTIS) is a project of the [Delphi
33-
Group](https://delphi.cmu.edu/) at Carnegie Mellon University. The Principal
34-
Investigator is [Alex Reinhart](https://www.refsmmat.com/); Wichada La
35-
Motte-Kerr is Survey Coordinator. The survey protocol is reviewed by the
36-
Carnegie Mellon University Institutional Review Board.
34+
Group](https://delphi.cmu.edu/) at Carnegie Mellon University. Team members
35+
include:
36+
37+
* [Alex Reinhart](https://www.refsmmat.com/), Principal Investigator
38+
* Wichada La Motte-Kerr, Survey Coordinator
39+
* Robin Mejia, survey advisor
40+
* Nat DeFries, statistical developer and data engineer
41+
* plus support from many members of the [Delphi
42+
team](https://delphi.cmu.edu/about/team/)
43+
44+
The survey protocol is reviewed by the Carnegie Mellon University Institutional
45+
Review Board.
3746

3847
The support of several institutions makes the survey possible. Facebook supports
3948
the survey through recruitment (participants are invited via their News Feed),
4049
survey sampling and weighting procedures, technical assistance in survey design
4150
and implementation, and coordination with researchers and public health
42-
officials. The University of Maryland's Joint Program in Survey Methodology
43-
conducts an [international version of the survey](https://covidmap.umd.edu/),
44-
and we coordinate closely on survey design and implementation. Delphi collects,
45-
aggregates, and distributes the US survey data, and retains ultimate
46-
responsibility for the US survey instrument and data.
51+
officials. The University of Maryland's Social Data Science Center conducts a
52+
[global version of the survey](https://covidmap.umd.edu/), and we coordinate
53+
closely on survey design and implementation. Delphi collects, aggregates, and
54+
distributes the US survey data, and retains ultimate responsibility for the US
55+
survey instrument and data.
4756

4857
We develop the survey collaboratively with data users, public health officials,
4958
and others. If you are interested in getting involved, see our

docs/symptom-survey/limitations.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
title: Survey Limitations
3+
parent: COVID-19 Trends and Impact Survey
4+
nav_order: 9
5+
---
6+
7+
# Survey Limitations
8+
{: .no_toc}
9+
10+
The COVID-19 Trends and Impact Survey (CTIS) gathers large amounts of detailed
11+
data; however, it is not perfect, and its design means it is subject to several
12+
crucial limitations. Anyone using the data to make policy decisions or answer
13+
research questions should be aware of these limitations. Given these
14+
limitations, we recommend using the data to:
15+
16+
- Track changes over time, such as to monitor sudden increases in reported
17+
symptoms or changes in reported vaccination attitudes.
18+
- Make comparisons across space, such as to identify regions with much higher or
19+
lower values.
20+
- Make comparisons between groups, such as between occupational or age groups,
21+
keeping in mind any [sample limitations](#the-sample) that might affect these
22+
comparisons.
23+
- Augment data collected from other sources, such as more rigorously controlled
24+
surveys with high response rates.
25+
26+
We do **not** recommend using CTIS data to
27+
28+
- Make point estimates of population quantities (such as the exact percentage of
29+
people who meet a certain criterion) without reference to other data sources.
30+
Because of sampling, weighting, and response biases, such estimates can be
31+
biased, and standard confidence intervals and hypothesis tests will be
32+
misleading.
33+
- Analyze very small or localized demographic subgroups. Due to the [response
34+
behavior issues](#response-behavior) discussed below, there is measurement
35+
error in the demographic data. Very small demographic groups may
36+
disproportionately include respondents who pick their demographics at random
37+
or attempt to disrupt the survey in other ways, even if those respondents are
38+
rare overall.
39+
40+
The sections below explain these limitations in more detail.
41+
42+
## Table of contents
43+
{: .no_toc .text-delta}
44+
45+
1. TOC
46+
{:toc}
47+
48+
## The Sample
49+
50+
Facebook takes a random sample of active adult users every day and invites them
51+
to complete the survey. ("Adult" means the user has indicated they are least 18
52+
years old in their profile.) Taking the survey is voluntary, and only 1-2% of those
53+
users who are invited actually take the survey. This leaves opportunities for
54+
sampling bias, if the sample is construed to represent the US adult population:
55+
56+
1. **Sampling frame.** The sample is random and maintains similar user
57+
characteristics each day, but it is drawn from adult Facebook active users
58+
who use one of the languages the survey is translated into: English [American
59+
and British], Spanish [Spain and Latin American], French, Brazilian
60+
Portuguese, Vietnamese, and simplified Chinese. This is not the United States
61+
population as a whole. While [most American adults use
62+
Facebook](https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/)
63+
and the available languages are more comprehensive than for many public
64+
health surveys, "most" is not the same as "all", and some demographic groups
65+
may be poorly represented in the Facebook sample.
66+
2. **Non-response bias.** Only a small fraction of invited users choose to take
67+
the survey when they are invited. If their decision on whether to take the
68+
survey is random, this is not a problem. However, their decision to take the
69+
survey may be correlated with other factors---such as their level of concern
70+
about COVID-19 or their trust of academic researchers. If that is the case,
71+
the sample will disproportionately contain people with certain attitudes and
72+
beliefs.
73+
74+
Facebook calculates [survey weights](weights.md) ([see below](#weighting)) that
75+
are intended to help correct for these issues. The weights adjust the age and
76+
gender distribution of the respondents to match Census data, and adjust for
77+
non-response by using a model for the probability of any user to click on the
78+
survey link. However, if that non-response model is not perfect (for example,
79+
non-response varies with respondent attributes not included in the model), or if
80+
the Facebook population differs from the US population on more features than
81+
just age and gender, the weights will not account for all sampling biases. For
82+
example, analyses of weighted survey data shows demographics relatively similar
83+
to the US population, with slightly higher levels of education and a smaller
84+
proportion of non-white respondents; however, comparisons of self-reported
85+
vaccination rates of survey respondents with CDC US population benchmarks
86+
indicate that CTIS respondents are more likely to be vaccinated than the general
87+
population.
88+
89+
We do, however, expect that any sampling biases will remain relatively
90+
consistent over time, allowing us to make reliable comparisons over time (such
91+
as noting an increase or decrease in vaccination rates or vaccine intent) even
92+
if the point estimates are consistently biased. This is a common issue with
93+
self-reported data; for example, surveys on illegal drug use expect
94+
under-reporting (as they ask about an illegal activity) but are commonly used to
95+
make comparisons between groups or over time.
96+
97+
Also, Facebook's sampling process allows users to be invited to the survey
98+
repeatedly. A user will only be reinvited at least thirty days after their
99+
previous invitation. Because respondents are anonymous and we do not receive any
100+
unique identifiers, responses from the same user are not linked in any way.
101+
Analysts must be aware that when working with responses submitted more than a
102+
month apart, some responses may be from the same users.
103+
104+
## Weighting
105+
106+
It is important to **read the [weights documentation](weights.md)** to
107+
understand how Facebook calculates survey weights and what they account for.
108+
There are some key limitations:
109+
110+
1. Because we do not receive Facebook profile data and Facebook does not receive
111+
survey response data, the weights are based only on attributes in Facebook
112+
profiles, *not* on demographics reported in response to survey questions. For
113+
example, if a respondent's Facebook profile says they are 35 years old and
114+
live in Delaware, but on the survey they respond that they are 45 years old
115+
and live in Maryland, the weight will be calculated based on the profile
116+
information and reflect the Delaware location. This causes measurement error
117+
in the weights.
118+
2. Similarly, the non-response model used by Facebook only uses information
119+
available to Facebook, such as profile information. As discussed above, if
120+
this model is not perfect, for example if factors not included in the model
121+
affect non-response, the weights will not fully account for this
122+
non-response bias.
123+
3. Facebook only invites users who it believes reside in the 50 states or
124+
Washington, DC. (Puerto Rico is sampled separately as part of the
125+
[international version of the survey](https://covidmap.umd.edu/).) If
126+
Facebook believes a user qualifies, but the user then replies that they live
127+
in Puerto Rico or another US territory, their weight will be incorrect.
128+
Starting in September 2021, these responses are not included in any
129+
microdata.
130+
131+
## Response Behavior
132+
133+
Survey scientists have long known that humans do not always provide complete and
134+
truthful responses to questions about their attributes, beliefs, and behaviors.
135+
There are two primary reasons CTIS responses may be suspect.
136+
137+
First is **social desirability bias.** As with all self-report measurements,
138+
survey respondents may give responses consistent with what they believe is
139+
socially desirable, because they feel pressured to fit social norms. For
140+
example, if someone lives in an area where masks are widely used and seen as
141+
essential, they may report that they wear their mask most or all of the time
142+
when in public, even if they don't. While this effect is likely smaller on an
143+
anonymous online survey than in an in-person interview, it could still be
144+
present.
145+
146+
The second problem is deliberate trolling. While intentional mis-reporting is
147+
always a possibility when users provide self-report data, it is a particular
148+
concern for a large, online survey on a controversial topic offered through a
149+
large social media platform. It appears that the vast majority of CTIS
150+
respondents complete the survey in good faith; however, we occasionally receive
151+
emails from survey respondents gloating that they have deliberately provided
152+
false responses to the survey, usually because they believe the COVID-19
153+
pandemic is a conspiracy or that scientists are suppressing key information.
154+
155+
We have also observed problematic behavior in a specific subset of respondents.
156+
While less than 1% of respondents opt to self-describe their own gender, a large
157+
percentage of respondents who do choose that option provide a description that
158+
is actually a protest against the question or the survey; for example, making
159+
trans-phobic comments or [reporting their gender identification as “Attack
160+
Helicopter”](https://knowyourmeme.com/memes/i-sexually-identify-as-an-attack-helicopter).
161+
Additionally, these respondents disproportionately select specific demographic
162+
groups, such as having a PhD, being over age 75, and being Hispanic, all at
163+
rates far exceeding their overall presence in the US population, suggesting that
164+
people who want to disrupt the survey also pick on specific groups to troll.
165+
166+
(Note that if a respondent is invited once but completes the survey multiple
167+
times, or shares their unique link with friends to take it, only the first
168+
response is counted; this limits the impact of deliberate trolling. If the
169+
respondent is sampled and invited again later, they receive a new unique link.)
170+
171+
For overall estimates, trolling is not expected to impact results in a
172+
meaningful way. However, given the concentration of trolls in small demographic
173+
groups, users interested in comparisons of small demographic groups should
174+
examine a sample of the raw data. For example, if you are interested in
175+
responses from Hispanic adults over age 65, examine the other demographic
176+
variables for this group of respondents to ensure they appear to match what you
177+
would expect and do not appear influenced by respondents who give deliberately
178+
strange answers.
179+
180+
Importantly, weights cannot correct for trolling behavior. Users can either note
181+
any concerns they have when reporting for small groups, or they may choose to
182+
analyze the data without a suspect group. We are continuing to evaluate trolling
183+
and will provide updates if new patterns appear.
184+
185+
## Missing Data
186+
187+
Some survey respondents do not complete the entire survey. This could be because
188+
they get impatient with it, because they do not want to respond to questions
189+
about specific topics, or simply because they are responding to the survey
190+
during a quick break or while waiting in line at Starbucks. (Remember, Facebook
191+
users see the invitation when they're browsing the Facebook news feed, which
192+
could be any time someone might pull out their phone and check Facebook.)
193+
194+
As a result, questions that appear later in the survey, including demographics,
195+
can be blank in 10-20% of survey responses. Similar to overall non-response,
196+
this is an issue when such behavior does not occur at random relative to the
197+
questions you are analyzing; for example, if individuals who are particularly
198+
concerned about COVID-19 are more likely to take the time to finish the survey.
199+
200+
Also, the CTIS survey instrument is deliberately designed so that most items are
201+
optional---Qualtrics will not attempt to force respondents to answer questions
202+
that they leave blank. This allows respondents to leave an item blank if they
203+
prefer not to answer it, rather than entering a nonsense answer. This can lead
204+
to missingness in the middle of the survey, even among respondents who answer
205+
later questions. As noted above, this missingness is almost certainly not at
206+
random. Data users should examine and report the missingness in the questions
207+
they use. Imputation methods are an option; users should consider whether the
208+
assumptions of imputation models appear to be met for the data.

docs/symptom-survey/modules.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ parent: COVID-19 Trends and Impact Survey
44
nav_order: 7
55
---
66

7-
# Questions and Coding
7+
# Survey Modules & Randomization
88
{: .no_toc}
99

1010
To reduce the overall length of the instrument and minimize response burden,

docs/symptom-survey/problems.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
2-
title: Problems and Data Errors
2+
title: Data and Sampling Errors
33
parent: COVID-19 Trends and Impact Survey
44
nav_order: 8
55
---
66

7-
# Problems and Data Errors
7+
# Data and Sampling Errors
88
{: .no_toc}
99

1010
Given the scale of the COVID-19 Trends and Impact Survey (CTIS), we occasionally

docs/symptom-survey/survey-files.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,20 @@ Responses qualify for inclusion in these files if they meet the following condit
104104
open-ended question (A2, A2b, B2b, Q40, C10_1_1, C10_2_1, C10_3_1, C10_4_1,
105105
D3, D4, D5) means to provide any number (floats okay) and to “answer” a radio
106106
button question is to provide a selection
107+
* starting September 2021, indicated that they live in the 50 states or
108+
Washington, DC
107109

108110
We do not require the user to have completed the survey, or to have seen all
109111
pages of the survey.
110112

113+
Facebook only invites users to take the survey if they appear, based on
114+
attributes in their Facebook profiles, to reside in the 50 states or
115+
Washington, DC. Puerto Rico is sampled separately as part of the
116+
[international version of the survey](https://covidmap.umd.edu/). Starting in
117+
September 2021, if Facebook believes a user qualifies for the survey, but the
118+
user then replies that they live in Puerto Rico or another US territory, we do
119+
not include their response in the individual response data.
120+
111121
## Collisions
112122

113123
One thing we haven't been able to fully fix is the problem of people forwarding

docs/symptom-survey/weights.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,6 @@ by Facebook. These weights are also used to produce our
1515
Facebook has provided documentation to describe the calculation and usage of
1616
these weights, [available here](symptom-survey-weights.pdf). This documentation
1717
explains the weight methodology, gives examples of how to use the weights when
18-
calculating estimates, and states the known limitations of the weights.
18+
calculating estimates, and states the known limitations of the weights. We also
19+
have separate information about the [survey's limitations](limitations.md) that
20+
affect what conclusions can be drawn from the survey data.

src/client/delphi_epidata.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Epidata <- (function() {
1515
# API base url
1616
BASE_URL <- 'https://delphi.cmu.edu/epidata/api.php'
1717

18-
client_version <- '0.2.13'
18+
client_version <- '0.2.14'
1919

2020
# Helper function to cast values and/or ranges to strings
2121
.listitem <- function(value) {

src/client/delphi_epidata.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
}
2323
})(this, function (exports, fetchImpl, jQuery) {
2424
const BASE_URL = "https://delphi.cmu.edu/epidata/";
25-
const client_version = "0.2.13";
25+
const client_version = "0.2.14";
2626

2727
// Helper function to cast values and/or ranges to strings
2828
function _listitem(value) {

0 commit comments

Comments
 (0)