-
Notifications
You must be signed in to change notification settings - Fork 373
Description
Hi,
thanks once again for creating pixi. The amount of headaches around hard to install/replicate environments I have just vanished to a very large extend.
I have a feature request that I should maybe do here, or maybe in https://github.com/Quantco/pixi-pack. But I was not sure because my feature request is basically asking for an alternative to pixi-pack.
Problem Description
The short version
Sometimes it is hard from a specific compute location (for instance CI or a somewhat isolated server) to be able to install from git(hub) or a private package index. It would therefore be nice to be able to selectively vendor dependencies while leaving out copies of packages available from other sources (for instance conda-forge or pypi).
The long version
In our machine learning projects we often have a few different libraries we are actively developing on containing different parts that come together in an experiment. To bring these pieces together, putting direct installs from github in our pixi.toml files arose as a really practical pattern:
[pypi-dependencies]
pydantic = "*"
pydantic-settings = "*"
...
foo = "*"
our_model_architectures = { git = "https://github.com/big-corporate/our-model-architectures", rev="feature/add-weird-architecture"}
time_travel_mathematics = { git = "https://github.com/big-corporate/time-travel-mathematics", rev="feature/wormhole" }
[feature.dataloading.pypi-dependencies]
our_shared_data_loaders = { git = "https://github.com/big-corporate/our-shared-data-loaders", rev="main"}
Doing these installs from github:
- as opposed to a local path: it guarantees the code was (obviously) checked in to github and there is no dirty state present in the repo (as opposed to potentially a local path).
- as opposed to going over a package index: keeps the interactions between team members (or even just yourself) fast and easy.
In larger corporate IT environments there are often a lot of restrictions to work around. What let me to want this feature was not being able to access (other) private GitHub repositories from GitHub Actions. Having a lightweight artifact bundle just containing the dependencies that canβt be fetched at build time makes it much easier to save such a snapshot as an artifact every time a machine learning experiment is started.
Because in that case:
- just a pixi.toml + pixi.lock is not enough, because those repos can not be reached
- a pix-pack can get really large, which is wasteful and takes more time to upload as a artifact each time you run an experiment.
It would therefore be nice to create something that combines the advantages of both: the fine-grained spec of the pixi.toml and pixi.lock with the vendoring options of pixi-pack (but then selective).
I hacked together a python version of what I mean here. But I am not exceptionally proud of the result (this was also an experiment of mine to see whether I could "vibe" code this, which was not the most vibey coding experience in my life):
https://github.com/jorenretel/pixi-vendor-proof-of-concept
Of course my example (no access to specific repos on github) is just one of very many imaginable use cases where it might be handy to ship a slim reproducible environment only containing copies of packages that you know are not going to be available at the target compute environment.
Extra Note:
While developing I almost always have something like this in my pixi.toml as well (breaking completely with advantage 1 I just listed before):
[pypi-dependencies]
library_under_heavy_construction = { path = ".", editable = true }It would be nice to be allowed to add that to the same snapshot, but I also understand that that opens another can of worms.
I'd love to hear your thoughts on this feature request and whether this selective vendoring approach would be valuable, or whether you think it would break more reproducibility than it would create.