-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
When the zstd CLI is called with -D and that file does not contain a valid dictionary, this is either ignored or results in potentially confusing 'Dictionary mismatch' errors.
To Reproduce
Steps to reproduce the behavior:
echo test | zstd -o test.zstecho notadict >falsedictzstdcat -D falsedict test.zstecho test | zstd -D falsedict -o test2.zstwget https://github.com/facebook/zstd/raw/c2c6a4ab40fcc327e79d5364f9c2ab1e41e6a7f8/tests/dict-files/zero-weight-dict(or any other valid non-empty dictionary)echo test | zstd -D zero-weight-dict -o test3.zstzstdcat -D falsedict test3.zst
Expected behavior
On steps 3, 4, and 7, I would expect an error that the contents of falsedict are not a valid zstd dictionary. Personally, I would prefer this to be a fatal error, but at least there should be a warning on stderr. Ideally, the error would include the file size.
Step 3 instead decompresses the file without any indication of an issue. Step 4 compresses the input without a dictionary. Step 7 produces an error 'Dictionary mismatch', which is probably the least bad case here but may lead the user to think that something's wrong with the compressed file or its metadata (e.g. when identifying the used dictionary out-of-band or when it is included in a skippable frame like #2349).
It gets even more confusing when increasing verbosity to level 4 or beyond: zstdcat -vvvv -D falsedict test3.zst then even prints a line Loading falsedict as dictionary but still doesn't indicate that the loading failed.
Desktop (please complete the following information):
- OS: Debian oldstable and sid
- Version: 1.3.8+dfsg-3, 1.4.8+dfsg-3 on amd64 (binary packages from Debian, not compiled from source by myself)
Additional context
I came across this issue as I had written a wrapper script to handle dictionaries in a skippable frame on .warc.zst files (cf. #2349). The script extracts the dictionary, puts it in a temporary file, then passes that to zstd for decompression. I was getting the 'Dictionary mismatch' error on some particular files and couldn't figure out for a long time why. As it turned out, the temporary file didn't get flushed to disk in some cases (namely when the dict was very small, so it was buffered somewhere without an explicit flush), leading zstd to see a 0-byte file and silently not load the dict. If zstd had told me about that load error, ideally including the file size it saw, this would've saved me a lot of debugging time.