Skip to content

Commit 08db7e9

Browse files
committed
Monthly files are compressed as well
1 parent 726c5c8 commit 08db7e9

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

docs/symptom-survey/survey-files.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ Each day, we write CSV files with names following this pattern:
4242
Dates in incremental filenames are of the form `YYYY_mm_dd`. `for` refers to the
4343
day the survey response was started, in the Pacific time zone (UTC -
4444
7). `recorded` refers to the day survey data was retrieved; see the [lag
45-
policy](#lag-policy) for more details.
45+
policy](#lag-policy) for more details. Each file is compressed with gzip, and
46+
the standard `gunzip` command on Linux or Mac can decompress it.
4647

4748
Every day, we write response files for all recent days of data, with today's
4849
`recorded` date. For each `for` date, you need only load the most recent
@@ -62,13 +63,17 @@ all survey responses from that month. These are in two forms.
6263

6364
First, the monthly CSV files have filenames in the form
6465

65-
{YYYY}-{mm}.csv
66+
{YYYY}-{mm}.csv.gz
6667

6768
and contain all valid responses for that month. These are produced from the
6869
daily files, by taking the data with the most recent `recordedby` date for each
69-
day of the month. Users doing historical analyses of the survey data should
70-
start with these files, since they provide the easiest way to get all the
71-
necessary data, without accidentally including duplicate results.
70+
day of the month. They are compressed with gzip; the standard `gunzip` command
71+
on macOS or Linux can decompress them. (macOS can also decompress these files
72+
through Finder automatically; on Windows, free programs like
73+
[7-zip](https://www.7-zip.org/) can decompress gzip files.) Users doing
74+
historical analyses of the survey data should start with these files, since they
75+
provide the easiest way to get all the necessary data, without accidentally
76+
including duplicate results.
7277

7378
Second, we produce monthly tarballs containing all the daily `.csv.gz` files for
7479
that month, with names in the form

0 commit comments

Comments
 (0)