Skip to content

Experimental portable lockdirs #11489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gridbugs
Copy link
Collaborator

Adds a feature flag for enabling portable lockdirs. This is a proof of concept implementation of portable lockdirs where the entire solver runs for each of a set of platforms (combinations of architecture, OS, and in some cases the OS distribution) which most people are expected to use. This can easily be extended in the future to add more platforms or to allow projects to specify more platforms.

To make lockdirs portable, the build/install commands and dependencies of each package are transformed into match statements, where the appropriate value for each platform is enumerated. At solve-time, the solver runs once for each platform, populating these fields. At build-time, the command/dependencies appropriate for the current platform are used.

When the feature flag is not enabled dune's behaviour is unchanged.

Related to #11476

@gridbugs
Copy link
Collaborator Author

See https://github.com/ocaml/dune/pull/11489/files#diff-2d6698a9af755d43a10093ee8248c2f6c9606358252ad453483447bf0d302fc5 for an example of how lockfiles look with this feature enabled.

@gridbugs gridbugs force-pushed the portable-lockdir-feature branch 4 times, most recently from 59d5fe2 to beebd21 Compare February 24, 2025 04:25
@gridbugs
Copy link
Collaborator Author

The failing lock-directory-selection test on macos is due to the issue fixed in #11497.

@gridbugs gridbugs force-pushed the portable-lockdir-feature branch from beebd21 to 5319f22 Compare February 24, 2025 07:17
@gridbugs gridbugs force-pushed the portable-lockdir-feature branch from 5319f22 to fb43b79 Compare March 3, 2025 06:09
@gridbugs
Copy link
Collaborator Author

gridbugs commented Mar 3, 2025

@rgrinberg can you take a look at this please

@rgrinberg
Copy link
Member

I'll try and get around to this in the next few days. Meanwhile can @art-w or @Leonidas-from-XIV give this a review?

@ManasJayanth
Copy link
Contributor

Is Dune_pkg.Solver_env.popular_platform_envs configurable? I recently implemented this feature for esy at esy/esy#1640 - one difference being the list of environments is configurable so we dont spend time solving for platforms user doesn't want us to.

@gridbugs
Copy link
Collaborator Author

Is Dune_pkg.Solver_env.popular_platform_envs configurable?

Not yet, but that's something I'm thinking of adding in the future. The goal for this PR is to show a proof of concept for portable lockdirs but there's several extensions I want to make to improve usability of this feature once the current PR is merged.

@gridbugs
Copy link
Collaborator Author

gridbugs commented Apr 3, 2025

@rgrinberg just a reminder to take a look at this when you get a chance.

@gridbugs gridbugs force-pushed the portable-lockdir-feature branch 2 times, most recently from 2f8475d to ca960dd Compare April 22, 2025 01:51
@rgrinberg rgrinberg requested a review from Alizter April 29, 2025 18:27
@rgrinberg
Copy link
Member

I've had a cursory glance, and a few things come to mind:

  1. This format is rather verbose. Can we deduplicate the various fields so that we don't need to write them more than once if they are unchanged for a subset of the platforms?

  2. How can users customize the set of platforms to solve for all at once?

  3. How do we handle a package that is only selected for a particular platform?

@gridbugs gridbugs force-pushed the portable-lockdir-feature branch 2 times, most recently from 5d5e366 to 191c271 Compare April 30, 2025 07:04
@gridbugs
Copy link
Collaborator Author

Thanks for the review! My intention with this PR is to share a conservative version of the approach I'd like to follow for implementing portable lockdirs. Ideally we would merge this PR so it doesn't bitrot, and then when I get some more time extend it to have more concise lockdirs, configurable platforms to solve, and a few other usability improvements. Currently all behaviour changes are behind a feature flag.

I've had a cursory glance, and a few things come to mind:

1. This format is rather verbose. Can we deduplicate the various fields so that we don't need to write them more than once if they are unchanged for a subset of the platforms?

Yep that's the plan. I intended to get a conservative version of this feature merged or at least make sure people are on board with my general approach before investing too much time. But yep I'll definitely be deduping the lockdirs before considering this feature to be complete.

2. How can users customize the set of platforms to solve for all at once?

They can't yet but I'm going to add a stanza in either dune-project or dune-workspace for specifying which platforms dune should solve for.

3. How do we handle a package that is only selected for a particular platform?

Currently lockfiles include one list of dependencies for each platform, so if a package is only depended on on a single platform then it will only appear in the list of dependencies corresponding to that platform.

If a project itself is only intended to be build/solved for a single platform, currently there's no way of expressing this fact. Dune will still attempt to solve for all the platforms it would normally try, and if no solution can be found for a given platform then that platform is not included in the solution in the lockdir. Currently (in this PR) dune only prints a solve error if solving fails for all the platforms it tried, though I'm considering adding a warning that solving failed on certain platforms in a future change.

Copy link
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of the complexity at least with regards to encoding and decoding is due to retaining compatibility with the old format, but if we agree that portable lock directories are the way to go, then maybe we should not make it opt in but rather change the format, update the tests and abandon the old format?

We don't promise compatibility between lockfile formats at the moment, so maybe to prevent legacy code we should just directly change to the new format?

I think the goal with portable lock files is good in general, however I remember people complaining that our lock files are already too verbose (but also, there's no sensible way to make them less verbose since the information stored in it needs to live somewhere, after all).

This change makes them significantly more verbose. I wonder if that might pose a hurdle to adoption. I personally don't mind that much, but some deduplication where feasible might still be nice.

~constraints
in
let portable_solver_env =
(* TODO: make sure nothing system-specific sneaks into the environment here *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that TODO done? By the code it seems like.

match results with
| [] -> Error errors
| x :: xs ->
Ok (List.fold_left xs ~init:x ~f:Dune_pkg.Opam_solver.Solver_result.merge, errors)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API perspective I find it somewhat weird that I can have an Ok case with errors it there is at least one success and Error otherwise.

Maybe better something like `Ok | `Partial | `Error if we want to treat "all suceeded" and "at least one failure" differently.

| Ok ({ lock_dir; files; pinned_packages; num_expanded_packages }, _errors) ->
(* TODO: Users might want to know if no solution was found on certain
platforms. Give the option to print the solver errors, even if a
solution was found on some platforms. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think if we were to print at least a warning like No solution for <configuration> found that would be very nice instead of succeeding silently.

I don't have a good solution to how to deal with debugging them. On one hand outputting the solver error for every failed configuration is extremely verbose, on the other hand just displaying "couldn't solve" does not help people to find a solution. Maybe a flag to display the verbose lock errors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log it and tell the user to inspect the failure in _build/log.

Ok (Lock_dir.Write_disk.prepare ~lock_dir_path ~files lock_dir, summary_message)
Ok
( Lock_dir.Write_disk.prepare
~portable:portable_lock_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make the argument portable_lock_dir to be clear what portable:bool is and then invoking it is also shorter.

; ( "--portable-lock-dir"
, toggle "portable_lock_dir"
, "Generate portable lock dirs. If this feature is disabled then lock dirs will be \
specialized to the machine where they are generated." )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enable this for the developer preview by default? Probably yes?


$ [ $(cat _build/_private/default/.pkg/foo/target/share/kernel) = $(uname -s) ]

$ [ $(cat _build/_private/default/.pkg/foo/target/share/machine) = $(uname -m) ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pretty fragile depending on where the test suite runs? If it runs on a platform that's not in the popular list it will fail.

; depexts = []
{ Lock_dir.Pkg.build_command = Lock_dir.Conditional_choice.empty
; install_command = Lock_dir.Conditional_choice.empty
; depends = Lock_dir.Conditional_choice.singleton_all_platforms []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing why this is singleton_all_platforms [] and the depexts is empty. What is the semantic difference here?

; depends = []
{ build_command = []
; install_command = []
; depends = [ { condition = map {}; value = [] } ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that there is a 1 dependency that has no conditions and no content. If feels like a complicated way to express [].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of [ { condition = map {}; value = [] } ] isn't that there is 1 dependency but instead that there is one condition that matches all systems, and under that condition there are no dependencies. I agree it's confusing though. I'll come up with a way of representing this that's easier to understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the build_command above can also have conditions and is empty as there are no build commands. So I wonder why depends is different.

; install_command =
[ { condition = map {}; value = [ "system"; "echo 'world'" ] }
]
; depends = [ { condition = map {}; value = [] } ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here. Keeping it so complicated makes it much harder to determine whether there are dependencies or not, because it requires the condition to be evaluated, but the code that creates this value knows that there is no dependency.

Adds a feature flag for enabling portable lockdirs. This is a proof of
concept implementation of portable lockdirs where the entire solver runs
for each of a set of platforms (combinations of architecture, OS, and in
some cases the OS distribution) which most people are expected to use.
This can easily be extended in the future to add more platforms or to
allow projects to specify more platforms.

To make lockdirs portable, the build/install commands and dependencies
of each package are transformed into match statements, where the
appropriate value for each platform is enumerated. At solve-time, the
solver runs once for each platform, populating these fields. At
build-time, the command/dependencies appropriate for the current
platform are used.

When the feature flag is not enabled dune's behaviour is unchanged.

Signed-off-by: Stephen Sherratt <[email protected]>
@gridbugs gridbugs force-pushed the portable-lockdir-feature branch from 191c271 to fa1fb25 Compare May 1, 2025 01:47
@gridbugs
Copy link
Collaborator Author

gridbugs commented May 1, 2025

Thanks for the detailed feedback @Leonidas-from-XIV! I'll update this to have more concise lockfiles and allow the user to choose which platforms to solve for in dune-workspace.

I also realized an oversight in the current design: I don't think it's ever correct to run the solver with platform variables unset. The current design doesn't always set os-distribution and os-version because there are way too many different possible combinations of these variables to solve for all of them. This leads to problems with packages which define an available condition that mentions either variable, which is quite common in opam-repository. E.g. available: os-distribution != "opensuse-leap" | os-version >= "16" is an example of a common way to constrain the version of a particular OS distro. The problem with this is if os-distribution or os-version is not defined, the entire predicate evaluates to undefined due to how undefined values propagate through boolean operations. This leads to dune thinking the package is unavailable.

The best solution I can think of is to assign sentinel values to platform variables that would otherwise be undefined while solving. I might experiment with this idea.

@gridbugs
Copy link
Collaborator Author

gridbugs commented May 2, 2025

I've introduced a couple of mechanism for deduplicating lockfiles:

  • if there are multiple different platforms on which a value (build commands, list of deps, etc) are the same, conditions are combined and the values are only included a single time.
  • If there is a value that is the same on all platforms then the condition printed in the lockfile is simply "all_platforms" rather than the list of platforms.

I've also added a field for explicitly tracking the platforms on which a package should be included in the build, and the field is only present in lockfiles of packages that are not always present. This was previously done using the conditions in the depends field, but that was a hack and this way it's much clearer what's going on (plus now the depends field can be omitted if it's always empty which wasn't possible before).

I also started adding sentinel values of platform variables when doing the solve to avoid the issue I described in my previous comment.

I still need to allow users to choose which platforms dune will solve for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants