-
-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
It appears that warcio recompress will add WARC-Block-Digest fields to records that do not already have that field.
In the ZIP there are 2 warcs.
example-warcs.zip
In orig.warc the warcinfo record at the start does not have a WARC-Block-Digest field at all. However if you run:
warcio recompress orig.warc warcio-recompress.warc.gz
gunzip warcio-recompress.warc.gz
And look at warc-recompress.warc you will see that the warcinfo record now has WARC-Block-Digest with a SHA1 hash. (I included a copy of warc-recompress.warc in the ZIP).
While I suppose more digests aren't a bad thing:
- I would not expect a recompression operation to alter the records in the WARC.
- This behavior isn't documented
- It (very slightly) increases the size of the WARC
My suggestion would be that warcio recompress should not alter the records of the WARC it is operating on.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels