Skip to content

ripunzip unzip-file is mostly serializing behind CloneableSeekableReader.inner Mutex #116

@mhansen

Description

@mhansen

Hi, on macOS Apple Sillicon with a SSD, I tried extracting a 50GB zip file (I got it from Google Takeout).

$ cargo install ripunzip
$ ripunzip unzip-file takeout-12345.zip

Activity Monitor shows about 33% CPU usage from ripunzip. And a speed a little lower than I'd expect.

I'd expect either disk or CPU to be saturated, but that doesn't seem to be happening.

In Activity Monitor, I click "Sample" and in the stack traces I can see most threads are spending over 80% of their time blocking on the mutex in CloneableSeekableReader.inner:

https://github.com/GoogleChrome/ripunzip/blame/08fade5ea3882f84f2bee27552884bccfe74d9d6/src/unzip/cloneable_seekable_reader.rs#L64C19-L66

OK, that would explain the lock contention. I suppose it makes sense if we must have a single seekable file descriptor. But I can imagine alternatives:

  • Open one file descriptor per thread, then each fd could be independently seekable without lock contention?
  • mmap the zip file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions