-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[DISCUSS] Consider Vendoring Certain Dependencies #15360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we can write a script to approximately estimate "how much" of an external crate we are using, we can create a starter list of candidates for this |
FWIW I think copying parts of third-party crates is often called "vendoring the dependencies" . Using that term might help communicate what this is proposing |
Pretty sure the linker/LLVM filters out unused code anyways- at least if LTO is on which it is for release builds. It might help decrease build time marginally but I really wouldn't expect the actual executable size to be significantly different. |
Right, but we spend a lot of time on CI and debugging without LTO. Also, dependency creep increases the number of "moving" parts and makes maintenance harder. Therefore I think this is a worthwhile exercise |
I can't say I agree with you there but I'm only one voice. I'd rather see the effort put into other areas tbh. |
I think that since rust effectively statically links all binaries vendoring dependencies will not likely make any difference in binary size The only benefit of vendoring stuff would be the dependecy creep issue |
IMO dependency creep is the most important issue here. For the others, we can run experiments to see what happens to compilation times and debug mode binary sizes. I don't think there is much point in engaging theoretical discussions on this. |
Is your feature request related to a problem or challenge?
DataFusion has many dependencies that leads to increased binary size and compilation while saving us the trouble of maintaining those implementations in our own codebase.
Describe the solution you'd like
This issue is to identify and collect any suggestions for crate/dependency
shallow
andstable
enough to justify having an In-house implementation.'shallow': We do not use much of the original crate. i.e we just use a small part that can be implemented in house without having to implement/port a massive dependency tree that crate has. The shallower the dependency, the more beneficial an in-house implementation would be.
'stable': The code is mostly stable and will not require much active changes after having an In-house implementation. (While all code eventually requires some changes, it is about how frequently and disruptively it will give us a headache.)
Describe alternatives you've considered
Do not have any in-house implementation. Pay the price of increased binary size and compile time.
Additional context
Originally suggested by @ozankabak while discussing my GSoC 2025 Proposal for "Optimizing compile time and binary size" over discord and before that in #14478.
The text was updated successfully, but these errors were encountered: