Monthly files are compressed as well

capnrefsmmat · capnrefsmmat · commit 08db7e9b24d2 · 2020-12-16T17:01:19.000-05:00
diff --git a/docs/symptom-survey/survey-files.md b/docs/symptom-survey/survey-files.md
@@ -42,7 +42,8 @@ Each day, we write CSV files with names following this pattern:
 Dates in incremental filenames are of the form `YYYY_mm_dd`. `for` refers to the
 day the survey response was started, in the Pacific time zone (UTC -
 7). `recorded` refers to the day survey data was retrieved; see the [lag
-policy](#lag-policy) for more details.
+policy](#lag-policy) for more details. Each file is compressed with gzip, and
+the standard `gunzip` command on Linux or Mac can decompress it.
 
 Every day, we write response files for all recent days of data, with today's
 `recorded` date. For each `for` date, you need only load the most recent
@@ -62,13 +63,17 @@ all survey responses from that month. These are in two forms.
 
 First, the monthly CSV files have filenames in the form
 
-    {YYYY}-{mm}.csv
+    {YYYY}-{mm}.csv.gz
 
 and contain all valid responses for that month. These are produced from the
 daily files, by taking the data with the most recent `recordedby` date for each
-day of the month. Users doing historical analyses of the survey data should
-start with these files, since they provide the easiest way to get all the
-necessary data, without accidentally including duplicate results.
+day of the month. They are compressed with gzip; the standard `gunzip` command
+on macOS or Linux can decompress them. (macOS can also decompress these files
+through Finder automatically; on Windows, free programs like
+[7-zip](https://www.7-zip.org/) can decompress gzip files.) Users doing
+historical analyses of the survey data should start with these files, since they
+provide the easiest way to get all the necessary data, without accidentally
+including duplicate results.
 
 Second, we produce monthly tarballs containing all the daily `.csv.gz` files for
 that month, with names in the form