File tree Expand file tree Collapse file tree 1 file changed +24
-0
lines changed Expand file tree Collapse file tree 1 file changed +24
-0
lines changed Original file line number Diff line number Diff line change 1+ # Changelog
2+
3+ ## 0.32.0
4+
5+ ### Added
6+
7+ - HeaderValidator with WARC/1.1 standard ruleset
8+ - ExtractTool: can now extract sequential concurrent records (` --concurrent ` option)
9+ - DedupeTool
10+ - In-memory cache for cross-URL digest-based deduplication (` --cache-size ` option)
11+ - Now prints deduplication statistics (` --dry-run ` and ` --quiet ` options)
12+ - Multi-threaded deduplication (` --threads ` option)
13+ - ValidateTool
14+ - Multi-threaded validation (` --threads ` option)
15+ - ParsingException message is now annotated with the source filename and record offset when available
16+
17+ ### Fixed
18+
19+ - RFC5952 canonical form is now used for IPv6 addresses in WARC-IP-Address
20+ - HttpParser in lenient mode now:
21+ - accepts responses missing version number
22+ - ignores header lines missing :
23+ - ignores folded status lines
24+ - WarcParser: treats ` alexa/dat ` ARC records as not HTTP type
You can’t perform that action at this time.
0 commit comments