Skip to content

CLI should fail if the dictionary file provided in -D does not contain a valid dictionaryΒ #2873

@JustAnotherArchivist

Description

@JustAnotherArchivist

Describe the bug
When the zstd CLI is called with -D and that file does not contain a valid dictionary, this is either ignored or results in potentially confusing 'Dictionary mismatch' errors.

To Reproduce
Steps to reproduce the behavior:

  1. echo test | zstd -o test.zst
  2. echo notadict >falsedict
  3. zstdcat -D falsedict test.zst
  4. echo test | zstd -D falsedict -o test2.zst
  5. wget https://github.com/facebook/zstd/raw/c2c6a4ab40fcc327e79d5364f9c2ab1e41e6a7f8/tests/dict-files/zero-weight-dict (or any other valid non-empty dictionary)
  6. echo test | zstd -D zero-weight-dict -o test3.zst
  7. zstdcat -D falsedict test3.zst

Expected behavior
On steps 3, 4, and 7, I would expect an error that the contents of falsedict are not a valid zstd dictionary. Personally, I would prefer this to be a fatal error, but at least there should be a warning on stderr. Ideally, the error would include the file size.

Step 3 instead decompresses the file without any indication of an issue. Step 4 compresses the input without a dictionary. Step 7 produces an error 'Dictionary mismatch', which is probably the least bad case here but may lead the user to think that something's wrong with the compressed file or its metadata (e.g. when identifying the used dictionary out-of-band or when it is included in a skippable frame like #2349).

It gets even more confusing when increasing verbosity to level 4 or beyond: zstdcat -vvvv -D falsedict test3.zst then even prints a line Loading falsedict as dictionary but still doesn't indicate that the loading failed.

Desktop (please complete the following information):

  • OS: Debian oldstable and sid
  • Version: 1.3.8+dfsg-3, 1.4.8+dfsg-3 on amd64 (binary packages from Debian, not compiled from source by myself)

Additional context
I came across this issue as I had written a wrapper script to handle dictionaries in a skippable frame on .warc.zst files (cf. #2349). The script extracts the dictionary, puts it in a temporary file, then passes that to zstd for decompression. I was getting the 'Dictionary mismatch' error on some particular files and couldn't figure out for a long time why. As it turned out, the temporary file didn't get flushed to disk in some cases (namely when the dict was very small, so it was buffered somewhere without an explicit flush), leading zstd to see a 0-byte file and silently not load the dict. If zstd had told me about that load error, ideally including the file size it saw, this would've saved me a lot of debugging time.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions