@@ -42,7 +42,8 @@ Each day, we write CSV files with names following this pattern:
42
42
Dates in incremental filenames are of the form ` YYYY_mm_dd ` . ` for ` refers to the
43
43
day the survey response was started, in the Pacific time zone (UTC -
44
44
7). ` recorded ` refers to the day survey data was retrieved; see the [ lag
45
- policy] ( #lag-policy ) for more details.
45
+ policy] ( #lag-policy ) for more details. Each file is compressed with gzip, and
46
+ the standard ` gunzip ` command on Linux or Mac can decompress it.
46
47
47
48
Every day, we write response files for all recent days of data, with today's
48
49
` recorded ` date. For each ` for ` date, you need only load the most recent
@@ -62,13 +63,17 @@ all survey responses from that month. These are in two forms.
62
63
63
64
First, the monthly CSV files have filenames in the form
64
65
65
- {YYYY}-{mm}.csv
66
+ {YYYY}-{mm}.csv.gz
66
67
67
68
and contain all valid responses for that month. These are produced from the
68
69
daily files, by taking the data with the most recent ` recordedby ` date for each
69
- day of the month. Users doing historical analyses of the survey data should
70
- start with these files, since they provide the easiest way to get all the
71
- necessary data, without accidentally including duplicate results.
70
+ day of the month. They are compressed with gzip; the standard ` gunzip ` command
71
+ on macOS or Linux can decompress them. (macOS can also decompress these files
72
+ through Finder automatically; on Windows, free programs like
73
+ [ 7-zip] ( https://www.7-zip.org/ ) can decompress gzip files.) Users doing
74
+ historical analyses of the survey data should start with these files, since they
75
+ provide the easiest way to get all the necessary data, without accidentally
76
+ including duplicate results.
72
77
73
78
Second, we produce monthly tarballs containing all the daily ` .csv.gz ` files for
74
79
that month, with names in the form
0 commit comments