Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About multithreading compress/decompress #11

Open
synodriver opened this issue Jul 8, 2023 · 0 comments
Open

About multithreading compress/decompress #11

synodriver opened this issue Jul 8, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@synodriver
Copy link
Owner

In current implemention, the BZ3OmpCompressor will not perform a compress until it's internal buffer is long enough for all threads to perform a compress, which means it has to wait until num_threads blocks are received by calling compress method, otherwise it will just return b"". This is called lazy compress, which guaranteed maximum performance. It's okey for compressor, but for decompressor, things changed. The decompressor just have no idea about when does the stream end. If the decompressor still buffers the input, the caller might thought that the input is not enough for a block decompress and drop it. So it only assume that the input could end at any time. In this way, the decompressor can't buffer the input to perform multi-threaded decompress. It perform a decompress when it's buffer is long enough for one thread to decompress(although it would be better if more blocks are received in the same time), making it degenerate to a single thread decompressor. An effective measure to avoid this is to fill at least num_threads blocks, for BZIP3File usage, you can

from bz3 import compression
compression.BUFFER_SIZE = 300*10**6

to increase the buffer size, making more blocks receive at the same time.

@synodriver synodriver added the documentation Improvements or additions to documentation label Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant