Skip to content

[DISCUSS] Consider Vendoring Certain Dependencies #15360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
logan-keede opened this issue Mar 22, 2025 · 7 comments
Open

[DISCUSS] Consider Vendoring Certain Dependencies #15360

logan-keede opened this issue Mar 22, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@logan-keede
Copy link
Contributor

logan-keede commented Mar 22, 2025

Is your feature request related to a problem or challenge?

DataFusion has many dependencies that leads to increased binary size and compilation while saving us the trouble of maintaining those implementations in our own codebase.

Describe the solution you'd like

This issue is to identify and collect any suggestions for crate/dependency shallow and stable enough to justify having an In-house implementation.
'shallow': We do not use much of the original crate. i.e we just use a small part that can be implemented in house without having to implement/port a massive dependency tree that crate has. The shallower the dependency, the more beneficial an in-house implementation would be.

'stable': The code is mostly stable and will not require much active changes after having an In-house implementation. (While all code eventually requires some changes, it is about how frequently and disruptively it will give us a headache.)

Describe alternatives you've considered

Do not have any in-house implementation. Pay the price of increased binary size and compile time.

Additional context

Originally suggested by @ozankabak while discussing my GSoC 2025 Proposal for "Optimizing compile time and binary size" over discord and before that in #14478.

@logan-keede logan-keede added the enhancement New feature or request label Mar 22, 2025
@logan-keede logan-keede changed the title [DISCUSS] In-house implementation of certain dependency package. [DISCUSS] In-house implementation of certain dependencies. Mar 22, 2025
@ozankabak
Copy link
Contributor

If we can write a script to approximately estimate "how much" of an external crate we are using, we can create a starter list of candidates for this

@alamb
Copy link
Contributor

alamb commented Mar 24, 2025

FWIW I think copying parts of third-party crates is often called "vendoring the dependencies" . Using that term might help communicate what this is proposing

@logan-keede logan-keede changed the title [DISCUSS] In-house implementation of certain dependencies. [DISCUSS] Consider Vendoring Certain Dependencies Mar 24, 2025
@Omega359
Copy link
Contributor

Pretty sure the linker/LLVM filters out unused code anyways- at least if LTO is on which it is for release builds. It might help decrease build time marginally but I really wouldn't expect the actual executable size to be significantly different.

@ozankabak
Copy link
Contributor

Right, but we spend a lot of time on CI and debugging without LTO. Also, dependency creep increases the number of "moving" parts and makes maintenance harder. Therefore I think this is a worthwhile exercise

@Omega359
Copy link
Contributor

I can't say I agree with you there but I'm only one voice. I'd rather see the effort put into other areas tbh.

@alamb
Copy link
Contributor

alamb commented Mar 27, 2025

I think that since rust effectively statically links all binaries vendoring dependencies will not likely make any difference in binary size

The only benefit of vendoring stuff would be the dependecy creep issue

@ozankabak
Copy link
Contributor

IMO dependency creep is the most important issue here. For the others, we can run experiments to see what happens to compilation times and debug mode binary sizes. I don't think there is much point in engaging theoretical discussions on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants