Hi, on macOS Apple Sillicon with a SSD, I tried extracting a 50GB zip file (I got it from Google Takeout).
$ cargo install ripunzip
$ ripunzip unzip-file takeout-12345.zip
Activity Monitor shows about 33% CPU usage from ripunzip. And a speed a little lower than I'd expect.
I'd expect either disk or CPU to be saturated, but that doesn't seem to be happening.
In Activity Monitor, I click "Sample" and in the stack traces I can see most threads are spending over 80% of their time blocking on the mutex in CloneableSeekableReader.inner:
https://github.com/GoogleChrome/ripunzip/blame/08fade5ea3882f84f2bee27552884bccfe74d9d6/src/unzip/cloneable_seekable_reader.rs#L64C19-L66
OK, that would explain the lock contention. I suppose it makes sense if we must have a single seekable file descriptor. But I can imagine alternatives:
- Open one file descriptor per thread, then each fd could be independently seekable without lock contention?
- mmap the zip file?