-
-
Notifications
You must be signed in to change notification settings - Fork 87
Description
I am trying to process all files inside a 7z one by one. My 7z archive is solid, so it is best to process it in one go, total size is much larger than RAM. I am trying to use the Py7zIO API with a WriterFactory, but I am unsure when an individual file is completed. Is there a way to know when a file inside the 7zip is complete?
There does not seem to be a close() defined in Py7zIO.
The documentation mentions I should be aware that I make my class "thread safe" but it seems I'm missing information how to do that. I think due to this it is not safe to expect a call to the Factory create() method to mean that the previous file is done decompressing.
When I check which functions in Py7zBytesIO are actually called during decompression, I find __init__(), write() and seek() are called, but flush() is never called.
For reasons unknown to me it seems seek(0, 0) is called at the end of decompressing an individual file.
At the moment I'm using a workaround that a call to seek(0, 0) is used to start processing the decompressed file, so I can cleanup memory to make space for the next file.
Is there a better solution for this, or is this missing functionality?
class Py7zBytesIO(py7zr.io.Py7zIO):
def __init__(self, filename: str):
self.filename = filename
self._buffer = io.BytesIO()
def write(self, s: bytes | bytearray) -> int:
return self._buffer.write(s)
def read(self, size: int | None = None) -> bytes:
return self._buffer.read()
def seek(self, offset: int, whence: int = 0) -> int:
result = self._buffer.seek(offset, whence)
# Workaround for missing close event
if offset == 0:
self.close()
return result
def flush(self) -> None:
# Seems to never be called during decompression
return self._buffer.flush()
def close(self) -> None:
# Should this be added to Py7IO and called when decompression is complete?
# ... process the decompressed file and cleanup memory.
class BytesIOFactory(py7zr.io.WriterFactory):
def __init__(self) -> None:
self.products: dict[str, Py7zBytesIO] = {}
def create(self, filename: str) -> py7zr.io.Py7zIO:
product = Py7zBytesIO(filename)
self.products[filename] = product
return product