Add simple tool for processing status availability files #41

travisbrown · 2021-10-14T12:38:34Z

The twcc check-existence tool takes as input a list of tweet IDs on standard input and outputs a CSV file like this, where 0 indicates that the tweet is unavailable (either deleted or from a suspended or locked account) and 1 indicates that it's live:

9142291,1
10643701,1
18984851,1
34661512,1
48826972,1
72706082,1
75507232,1
102425362,0
119141862,1
120506682,0

We've published this output for large batches of tweet IDs at s3://twitter-metadata/status-availability/. The output there includes rechecks, where we check the same IDs multiple times (to get a fresh view after months, etc.).

It can be useful to combine these results into a single list that indicates the most recently known status for each tweet ID, which is what this tool does. You point it to a directory of CSV files like the example above, where the filenames correspond to chronological order, and it outputs a sorted list of each unique tweet ID and its most recent status.

Note that it takes about four minutes to run on our current availability files (representing a hundred-million-ish tweets and retweets, most of which have been checked at least twice).

codecov-commenter · 2021-10-14T12:41:39Z

Codecov Report

Merging #41 (7012473) into main (5afe23f) will decrease coverage by 0.19%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main      #41      +/-   ##
==========================================
- Coverage   26.63%   26.44%   -0.20%     
==========================================
  Files          44       45       +1     
  Lines        2775     2795      +20     
==========================================
  Hits          739      739              
- Misses       2036     2056      +20

Impacted Files	Coverage Δ
src/bin/availability.rs	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5afe23f...7012473. Read the comment docs.

Add simple tool for processing status availability files

7012473

travisbrown merged commit e3d6a9c into main Oct 14, 2021

travisbrown deleted the topic/availability-tool branch October 14, 2021 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple tool for processing status availability files #41

Add simple tool for processing status availability files #41

travisbrown commented Oct 14, 2021

codecov-commenter commented Oct 14, 2021

Add simple tool for processing status availability files #41

Add simple tool for processing status availability files #41

Conversation

travisbrown commented Oct 14, 2021

codecov-commenter commented Oct 14, 2021

Codecov Report