Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary support #16

Open
hoelzro opened this issue Oct 22, 2018 · 7 comments
Open

Dictionary support #16

hoelzro opened this issue Oct 22, 2018 · 7 comments

Comments

@hoelzro
Copy link

hoelzro commented Oct 22, 2018

Hello there!

I'm interested in using this module, but I'd really like support for zstd dictionaries. I'm willing to add this myself, but I thought I'd open an issue here first to see if a) it's a feature you'd be interested in merging, and b) what you think the interface should look like. Let me know what you think!

-Rob

@spiritloose
Copy link
Owner

@hoelzro

Hi Rob,

I agree with you. I'd like to support streaming compression/decompression.

I have tried to implement like compress_using_dict() function but advanced streaming functions are needed for it.

Advanced streaming functions are experimental and are documented to "Use them only in association with static linking".

Compress::Zstd uses static linking now but I'm planning to use dynamic linking because libzstd is already popular library and already included in any major package systems like Homebrew, Ubuntu, and so on.

I have never used the dictionary compression with production use so far, so I have not decided which way to choose.

Do you use the dictionary compression?
Please tell me your usecase.

Thanks,

@hoelzro
Copy link
Author

hoelzro commented Oct 25, 2018

@spiritloose I've only used dictionary compression in experiments to see how much space it would save me; I haven't used it in production yet because this module doesn't support it.

@plambert
Copy link

Dictionary support would be very useful to me as well. Specifically, I'd like to be able to feed data into an object or function, and then extract the dictionary. Later, I'd like to provide the dictionary and some data to the compress function, and get a result which can be decompressed with the same dictionary.

This would allow me to compress a lot of small pieces of data while still being able to address them individually, and without the huge overhead of having a new dictionary for each.

Thanks,

@plambert
Copy link

plambert commented Apr 5, 2019

Thanks for the update!

I'd like to test this; it's not clear how I'd go about accomplishing my use case:

  1. Create a dictionary from 1,000-10,000 in-memory strings (typically around 80-1000 bytes each).
  2. Write the dictionary to a database.
  3. Compress a series of small strings (also about 80-1000 bytes each) and write them to the database.

Then, to decompress:

  1. Read the dictionary from the database.
  2. Read each compressed string from the database and decompress with the dictionary.

I apologize if this should be obvious to me; I've looked at the source and while I see where it's now possible to pass a dictionary to compression and decompression routines, it's not clear how to train a dictionary.

Thanks for your help!

@epa
Copy link

epa commented Jan 8, 2024

May I suggest that decompression using an existing dictionary might be easier to implement than compression or dictionary-building, and so that might be the thing to add first?

@plambert
Copy link

I don't have an existing dictionary and compressed strings to use. I suppose I could dump the strings to use to build the dictionary to a lot of temp files and use the zstd command line tool to create the dictionary. Then compress the strings with the zstd command line tool, and finally decompress them with Compress::Zstd. Would that be something to characterize as "easier?" Maybe using zstd to generate the dictionary would be. Maybe I could use Compress::Zstd to compress the strings, then decompress them, to prove it works.

Obviously this hasn't been a high priority. I'm still interested in it though.

@epa
Copy link

epa commented Jan 23, 2024

Hi @plambert, sorry I wasn't clear. Adding decompression support only, without implementing support for compression using a dictionary, wouldn't help your use case. I only meant it might be easier to implement. And it would help my use case, where I have a fixed dictionary I prepared as a one-off; so apologies for squatting on your feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants