Skip to content

Introduce PAX header support #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

piurafunk
Copy link

In my use case, I'm pulling very large files from a docker volume, using the Docker HTTP API endpoint. Because the files I'm pulling are larger than 8 GiB, the Docker engine prefixes the data stream with a PAX header docs. They do this to indicate the size of the file is larger than 8 GiB (tar format has a limitation the makes it so it can only record the size of the file as 8 GiB).

This PR should introduce support for the size parameter in a PAX header. There are other headers that can be implemented, but I wanted to get this feature in, because it's the core of how the header can be processed. As other use cases come up, they can be added.

The test added takes quite a long time to run. It tests a tar archive that contains 1 x 10 GiB file, 2 x 10 GiB files, and 3 x 10 GiB files, to ensure it would work on a directory of very large files. Currently the test is configured to run if the CI env var is set, which GitHub Actions has set on their workflows (docs). For local development, they will be marked as skipped, but a local env var of TEST_MASSIVE_FILE=true would allow it to run as well. This way we know the tests pass before merging a PR, but development of features unrelated to these large files is faster to iterate on. If this should be changed so that CI runs are faster, I am happy to do so.

Copy link
Owner

@jakubboucek jakubboucek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks very interesting for me. Thanks! 👍 I never heard about PAX extension.

I want to merge it, but i want to fully understand all properties of that and knows all consequecies to code quality and stability.

Please be patient.

@piurafunk piurafunk requested a review from jakubboucek January 2, 2025 17:40
@piurafunk
Copy link
Author

piurafunk commented Jan 2, 2025

9de2244 and 27fcf44 were to fix a PHPStan error:

 ------ ------------------------------------------------------------------------
  Line   StreamReader.php
 ------ ------------------------------------------------------------------------
  175    Parameter #2 $length of function fread expects int<1, max>, int given.
 ------ ------------------------------------------------------------------------

By introducing checks for 0 length tar archives, it breaks legitimate tar archives that have 0 length, such as directories. I'm fine rolling those commits back, and instead instructing PHPStan to ignore that specific error on that line with // @phpstan-ignore argument.type. Does that work for you?

@piurafunk
Copy link
Author

I went ahead and applied that PHPStan ignore comment. I also fixed another bug I found, related to null byte padding.

@jakubboucek
Copy link
Owner

Thanks. I am still very busy, and I expect a delay of a few months before I can review and merge. Sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants