-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop "cpio" libraries and write something semi-custom, because RPM doesn't use vanilla CPIO #108
Comments
A real concern: Do we want to support rpms with cpio-like archives larger than 4GB? It feels like we pull in a lot of pain for supporting an antipattern? Are there use-cases that are idiomatic that require rpms larger than 4GB? |
@drahnr The example that typically comes up is games, which often include many large assets, or ML models, or their training data. In practice those are rarely distributed as system packages but it is possible and has been done. |
My question: Are we anticipating this crate being used for games, using |
It's not just a matter of writing but also reading. I'm not sure I want to assume that nobody will ever want to use this crate to process the contents of existing such RPMs. I don't know that it's such a drain on resources. |
Tbh, I'd prefer we create a separate |
I also think that a separate crate would be a better approach. Maybe you should create a repository for it? |
I'm a bit lukewarm on having a separate crate, because I can't think of anything apart from an RPM parser which would want to parse RPM payloads. So it would be a separate crate that we would be the only users of, probably ever. |
I am mostly thinking operationally: applying upstream changes would be as easy as a git rebate or merge. I couldn't care less if we stay the only user if it simplifies the maintenence |
I don't think there will be any maintenance, the library is "finished" and hasn't seen any commits in a year. CPIO is very simple so there are unlikely to be any bugs. |
We haven't reached a conclusion here, my preference is still on forking to |
I still have the opposite preference, tbh 🤷♂️. It's very difficult for me to imagine the supposed maintenance benefit repaying itself against having a separate crate which nobody but this particular library will ever use. Since the new payload format removes nearly all of the metadata from the archive (because it's duplicated in the RPM header), you can do very little with the payload without also reading the RPM header. So the obvious thing to do is for us to just provide an API for that directly from this crate, since it would be pretty much the only useful way to use that code. There is another development since we last had the discussion, which is that RPMv6 plans to use only the "new" payload scheme, so it won't be relegated to just packages with files >4gb anymore, it will eventually be all packages. That is mentioned under the "Payload" section here: rpm-software-management/rpm#2374 |
Note to self: make sure we handle the edge cases discussed here: https://blog.colindou.ch/posts/lets-make-an-os-cpio-weirdness/ That is: be careful with directories, and be careful with absolute paths. Extracting a cpio shouldn't automatically overwrite /usr/bin/blah. |
See the "Payload" section of the website: https://rpm-software-management.github.io/rpm/manual/format.html
So, we should fork
cpio-rs
(providing the appropriate credits of course), strip it down to the subset we need, and change the magic bytes constant.Luckily the CPIO format is pretty simple and the library only a few hundred lines, so it's not a big deal.
Subsequently we need to change the
PAYLOADFORMAT
tag, but upstream RPM still usescpio
as the name, so we'll have to wait until they pick something.The text was updated successfully, but these errors were encountered: