Skip to content

Commit 1dd9c14

Browse files
authored
Merge pull request #436 from cmu-delphi/docs/fb-survey-criteria
Improve fb-survey documentation
2 parents f7fc080 + 5d3cc05 commit 1dd9c14

File tree

1 file changed

+62
-9
lines changed

1 file changed

+62
-9
lines changed

docs/api/covidcast-signals/fb-survey.md

Lines changed: 62 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,11 @@ grand_parent: COVIDcast Epidata API
1919

2020
This data source is based on symptom surveys run by the Delphi group at Carnegie
2121
Mellon. Facebook directs a random sample of its users to these surveys, which
22-
are voluntary. Individual survey responses are held by CMU and are sharable with
23-
other health researchers under a data use agreement. No individual survey
24-
responses are shared back to Facebook. See our [surveys
25-
page](https://covidcast.cmu.edu/surveys.html) for more detail about how the
22+
are voluntary. Users age 18 or older are eligible to complete the surveys, and
23+
their survey responses are held by CMU and are sharable with other health
24+
researchers under a data use agreement. No individual survey responses are
25+
shared back to Facebook. See our [surveys
26+
page](https://delphi.cmu.edu/covidcast/surveys/) for more detail about how the
2627
surveys work and how they are used outside the COVIDcast API.
2728

2829
We produce several sets of signals based on the survey data, listed and
@@ -39,6 +40,13 @@ described in the sections below:
3940
4. [Mental health indicators](#mental-health-indicators), based on self-reports
4041
of anxiety, depression, isolation, and worry about COVID
4142

43+
Many of these signals can also be browsed on our [survey
44+
dashboard](https://delphi.cmu.edu/covidcast/survey-results/) at any selected
45+
location.
46+
47+
Additionally, contingency tables containing demographic breakdowns of survey
48+
data are [also available for download](../../symptom-survey/contingency-tables.md).
49+
4250
## Table of Contents
4351
{: .no_toc .text-delta}
4452

@@ -74,17 +82,18 @@ Researchers can [request
7482
access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/)
7583
to (fully de-identified) individual survey responses for research purposes.
7684

77-
As of mid-August 2020, the average number of Facebook survey responses we
78-
receive each day is about 74,000, and the total number of survey responses we
79-
have received is over 9 million.
85+
As of early March 2021, the average number of Facebook survey responses we
86+
receive each day is about 40,000, and the total number of survey responses we
87+
have received is over 17 million.
8088

8189
## ILI and CLI Indicators
8290

8391
Of primary interest for the API are the symptoms defining a COVID-like illness
8492
(fever, along with cough, or shortness of breath, or difficulty breathing) or
8593
influenza-like illness (fever, along with cough or sore throat). Using this
86-
survey data, we estimate the percentage of people who have a COVID-like illness,
87-
or influenza-like illness, in a given location, on a given day.
94+
survey data, we estimate the percentage of people (age 18 or older) who have a
95+
COVID-like illness, or influenza-like illness, in a given location, on a given
96+
day.
8897

8998
| Signals | Description |
9099
| --- | --- |
@@ -396,6 +405,50 @@ below](#survey-weighting) to be more representative of state demographics, are
396405
also available. These have names beginning `smoothed_w`, such as
397406
`smoothed_wdepressed_14d`.
398407

408+
## Limitations
409+
410+
When interpreting the signals above, it is important to keep in mind several
411+
limitations of this survey data.
412+
413+
* **Survey population.** People are eligible to participate in the survey if
414+
they are age 18 or older, they are currently located in the USA, and they are an active user of Facebook. The survey
415+
data does not report on children under age 18, and the Facebook adult user
416+
population may differ from the United States population generally in important
417+
ways. We use our [survey weighting](#survey-weighting) to adjust the estimates
418+
to match age and gender demographics by state, but this process doesn't adjust
419+
for other demographic biases we may not be aware of.
420+
* **Non-response bias.** The survey is voluntary, and people who accept the
421+
invitation when it is presented to them on Facebook may be different from
422+
those who do not. The [survey weights provided by Facebook](#survey-weighting)
423+
attempt to model the probability of response for each user and hence adjust
424+
for this, but it is difficult to tell if these weights account for all
425+
possible non-response bias.
426+
* **Social desirability.** Previous survey research has shown that people's
427+
responses to surveys are often biased by what responses they believe are
428+
socially desirable or acceptable. For example, if it there is widespread
429+
pressure to wear masks, respondents who do *not* wear masks may feel pressured
430+
to answer that they *do*. This survey is anonymous and online, meaning we
431+
expect the social desirability effect to be smaller, but it may still be
432+
present.
433+
* **False responses.** As with anything on the Internet, a small percentage of
434+
users give deliberately incorrect responses. We discard a small number of
435+
responses that are obviously false, but do not perform extensive filtering.
436+
However, the large size of the study, and our procedure for ensuring that each
437+
respondent can only be counted once when they are invited to take the survey,
438+
prevents individual respondents from having a large effect on results.
439+
* **Repeat invitations.** Individual respondents can be invited by Facebook to
440+
take the survey several times. Usually Facebook only re-invites a respondent
441+
after one month. Hence estimates of values on a single day are calculated
442+
using independent survey responses from unique respondents (or, at least,
443+
unique Facebook accounts), whereas estimates from different months may involve
444+
the same respondents.
445+
446+
Whenever possible, you should compare this data to other independent sources. We
447+
believe that while these biases may affect point estimates -- that is, they may
448+
bias estimates on a specific day up or down -- the biases should not change
449+
strongly over time. This means that *changes* in signals, such as increases or
450+
decreases, are likely to represent true changes in the underlying population,
451+
even if point estimates are biased.
399452

400453
## Survey Weighting
401454

0 commit comments

Comments
 (0)