-
Notifications
You must be signed in to change notification settings - Fork 2
Introduce PAX header support #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks very interesting for me. Thanks! 👍 I never heard about PAX extension.
I want to merge it, but i want to fully understand all properties of that and knows all consequecies to code quality and stability.
Please be patient.
9de2244 and 27fcf44 were to fix a PHPStan error:
By introducing checks for 0 length tar archives, it breaks legitimate tar archives that have 0 length, such as directories. I'm fine rolling those commits back, and instead instructing PHPStan to ignore that specific error on that line with |
I went ahead and applied that PHPStan ignore comment. I also fixed another bug I found, related to null byte padding. |
Thanks. I am still very busy, and I expect a delay of a few months before I can review and merge. Sorry about that. |
In my use case, I'm pulling very large files from a docker volume, using the Docker HTTP API endpoint. Because the files I'm pulling are larger than 8 GiB, the Docker engine prefixes the data stream with a PAX header docs. They do this to indicate the size of the file is larger than 8 GiB (
tar
format has a limitation the makes it so it can only record the size of the file as 8 GiB).This PR should introduce support for the
size
parameter in a PAX header. There are other headers that can be implemented, but I wanted to get this feature in, because it's the core of how the header can be processed. As other use cases come up, they can be added.The test added takes quite a long time to run. It tests a tar archive that contains 1 x 10 GiB file, 2 x 10 GiB files, and 3 x 10 GiB files, to ensure it would work on a directory of very large files. Currently the test is configured to run if the
CI
env var is set, which GitHub Actions has set on their workflows (docs). For local development, they will be marked as skipped, but a local env var ofTEST_MASSIVE_FILE=true
would allow it to run as well. This way we know the tests pass before merging a PR, but development of features unrelated to these large files is faster to iterate on. If this should be changed so that CI runs are faster, I am happy to do so.