Skip to content

Experimental new Interval API with continuous space for versions #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from

Conversation

mpizenberg
Copy link
Member

Tests are not passing yet. It is quite a big change so some bugs were inserted (not on purpose ^^)

@Eh2406
Copy link
Member

Eh2406 commented Jun 23, 2021

It is a big change, I could use some pointers on where to start looking this over. What are the core new ideas? What advantages do they bring?

@mpizenberg
Copy link
Member Author

mpizenberg commented Jun 23, 2021

The core idea of this API design attempt is to relax the Version trait enough to enable more features but still avoid making Range a user-implemented trait. Instead I went for a middle-ground of adding an Interval trait that has to be provided by the user. (PS for the time being I kept the Range name but would like to rename it to Ranges plural).

So the traits to implement by the user are the following.

pub trait Version: Clone + Ord + Debug + Display {
    /// Returns the minimum version.
    fn minimum() -> Bound<Self>;
    /// Returns the maximum version.
    fn maximum() -> Bound<Self>;
}

/// An interval is a bounded domain containing all values
/// between its starting and ending bounds.
pub trait Interval<V>: RangeBounds<V> + Debug + Clone + Eq + PartialEq {
    /// Create an interval from its starting and ending bounds.
    /// It's the caller responsability to order them correctly.
    fn new(start_bound: Bound<V>, end_bound: Bound<V>) -> Self;
}

The Interval type extends std::ops::RangeBounds so the user has to implement that trait as well.

pub trait RangeBounds<T> {
    pub fn start_bound(&self) -> Bound<&T>;
    pub fn end_bound(&self) -> Bound<&T>;

    pub fn contains<U>(&self, item: &U) -> bool { ... }

This means giving the start and the end of the interval as bounds (potentially infinite). And a contains method is provided with a default implementation (check if within bounds) but can be overridden. For example, for semantic versions, one could check if a bound contains a pre-release marker to alter the contains implementation.

But the cool change is that the Range type is still handled by ourselves. It is defined as follows.

pub struct Range<I, V> {
    segments: SmallVec<I>,
    phantom: PhantomData<V>,
}

So basically the same as before with the exception that it now has two type variables, one for Interval, one for Version. I could not figure out if I could just have the I and not the V because it was giving me compiler issues in the impl block. (ooohh I now realize that V could be an associate type of Interval instead of a type variable ... I changed all types everywhere ...)

impl<I: Interval<V>, V: Version> Range<I, V> {
   ...
}

Since Version is much less strict than before, it enables things like continuous types since there is no more bump (which pre-releases kind-of implies) and infinite double bounds, not only on the right side. And in our Range implementation, the contains method is implemented as before, except that instead of returning true when a version is inside one of the intervals, it delegates a call to Interval::contains and the user might decide that it does not contains that version (e.g. pre-releases if bounds do not contains pre-releases).

The downside is that the user cannot override the behavior for versions outside of a given interval, but I don't see that being a desirable property. The user also do not see the full Range when answering for contains, only one of its Interval so it would not be possible to introduce faithfully things like "no pre-release except if that's the only version satisfying a given constraint" (in PEP 440), because it would not know if potentially another segment in the range might contain a non-pre-release version. I think that would be an extremely rare case anyway.

But the huge advantage is that we keep the control of Range and its quite hard to implement functions like intersection and complement. Proof is I introduced a bug when rewriting it XD and now need to chase it ^^.

@mpizenberg
Copy link
Member Author

To have a look at what's new, you can just look at the range_trait (misnamed) and version_trait modules. The rest of the changes are just type variables manipulations.

@Eh2406
Copy link
Member

Eh2406 commented Jun 23, 2021

I think this will be cleaner with a associated type for V. I will do a closer review when tests are passing. Then try to see how to support the Semver Crate with this approach.

@Eh2406
Copy link
Member

Eh2406 commented Jun 24, 2021

Thanks for the explanation! I had a few minutes to look this over. It is definitely an intriguing approach!

I am not seeing how to construct a Range that represents something like Semvers >=2.0.0-alfa, witch is "(non-pre versions >= 2.0.0) union (pre versions >= 2.0.0-alfa and < 2.0.1)". I can see how to make an Interval that stores whether its contains matches a pre, but Range only constructs I using I::new so I don't see how to get a Range that has some Intervals that do and some that don't.

@mpizenberg
Copy link
Member Author

mpizenberg commented Jun 24, 2021

I am not seeing how to construct a Range that represents something like Semvers >=2.0.0-alfa

Well if your semver type is something like

impl SemVer {
    fn new(major: u32, minor: u32, patch: u32, pre: Option<String>) -> Self { ... }
}

Then you'd just call

SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded)

And within the SemInterval implementation, you can override the contains method to check for a pre-release in bound. Something like

impl RangeBounds<SemVer> for SemInterval {
    fn start_bound(&self) -> Bound<&SemVer> { ... }
    fn end_bound(&self) -> Bound<&SemVer> { ... }
    // this one we override the default implementation provided by `RangeBounds`
    fn contains(&self, item: &SemVer) -> bool {
        // 2.0.0-beta
        if item.is_prerelease() {
            // 2.0.0-alpha                                    2.0.0-alpha, 2.0.0
            self.start.is_prerelease() && item.within_bounds(self.start, self.start.without_prerelease())
                           // 4.5.3-beta,      4.5.3-alpha,  4.5.3-beta,         4.5.3           4.5.3-beta
                || self.end.is_prerelease() && item.lower_than(self.end) && item.without_prerelease().higher_than(self.end)
        } else {
            item.within_bounds(self.start, self.end)
        }
    }

}

It's more or less pseudo code so take that with a grain of salt as I've not thought through it all, just writing stuff as it comes to mind right now.

@mpizenberg
Copy link
Member Author

PS, I've edited the above example code for contains() at least a couple times so don't trust what the email you received says XD

@mpizenberg
Copy link
Member Author

The above code example might be horribly wrong, the important notion is that "if we can explain in english with simple rules if a version (pre-release, or not) is included in an interval by just looking at the interval bounds, then we can do it in code too"

@mpizenberg
Copy link
Member Author

mpizenberg commented Jun 24, 2021

but Range only constructs I using I::new so I don't see how to get a Range that has some Intervals that do and some that don't.

Oh maybe I misinterpret the question. So the thing is that Range knows each interval bound. So the Range can check bounds of its stored intervals (which are non-intersecting) and once it reaches one that should contain the given version, instead of returning true it returns that_interval.contains(that_version).

See for example here:

Ordering::Less => return seg.contains(version),

I should definitely add some ascii segments drawing for each branch of that Range.contains() method.

@Eh2406
Copy link
Member

Eh2406 commented Jun 24, 2021

That all makes sense except for one small part. When I call SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded) I get a SemInterval but I need a Range to put in my DependencyConstraints.

@mpizenberg
Copy link
Member Author

mpizenberg commented Jun 24, 2021

That all makes sense except for one small part. When I call SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded) I get a SemInterval but I need a Range to put in my DependencyConstraints.

Well we expose the same API as before for Range which will create those intervals internally. So for example

    /// Set of all versions comprised between two given versions.
    /// The lower bound is included and the higher bound excluded.
    /// `v1 <= v < v2`.
    pub fn between(v1: impl Into<V>, v2: impl Into<V>) -> Self {
        let start_bound = Bound::Included(v1.into());
        let end_bound = Bound::Excluded(v2.into());
        Self {
            segments: SmallVec::one(I::new(start_bound, end_bound)),
            phantom: PhantomData,
        }
    }

That's why I needed to add the new() function the the Interval trait. I didn't yet found an alternative but maybe there is a smarter way to do that.

That's what you'll find here:

pub fn empty() -> Self {

@Eh2406
Copy link
Member

Eh2406 commented Jun 24, 2021

The above code example might be horribly wrong, the important notion is that "if we can explain in english with simple rules if a version (pre-release, or not) is included in an interval by just looking at the interval bounds, then we can do it in code too"

It took a few readings before this sank in. But now that I get it, it sounds worth a try!

@mpizenberg
Copy link
Member Author

mpizenberg commented Jun 24, 2021

It took a few readings before this sank in. But now that I get it, it sounds worth a try!

If the VersionReq type in the semver crate is what I believe it is, contains (Interval::contains not Range::contains) could literally be implemented by re-using their matches method. So something in the like of

self.into_version_req().matches(version)

Of course we could just rewrite the content of that function, that would be more efficient and simpler than doing the into_version_req conversion.

@Eh2406
Copy link
Member

Eh2406 commented Jun 25, 2021

So one thing we are going to need to be careful of is that range.contains(v) == false does not mean that range.complement().contains(v) == true. I am not sure all the places this may bight us. One that I have thought of is:

  1. We have a dependency foo="<5.0.0"
  2. We discover that there are no matching versions; unbenounced to us there is a 2.0.0-alfa but it is not compatible with this Req
  3. We backtrack
  4. We find a new dependency foo="<2.0.0-beta". If we are not careful we will decided that because 2.0.0-beta < 5.0.0 and there are no versions in "<5.0.0" then there are no version in "<2.0.0-beta"

@mpizenberg
Copy link
Member Author

Hum, very true ... Breaking the strict meaning of contains in an asymmetric manner might cause issues with backtracking and saved NoVersion incompatibilities.

@mpizenberg
Copy link
Member Author

So one thing we are going to need to be careful of is that range.contains(v) == false does not mean that range.complement().contains(v) == true. I am not sure all the places this may bight us. One that I have thought of is:

1. We have a dependency `foo="<5.0.0"`

2. We discover that there are no matching versions; unbenounced to us there is a `2.0.0-alfa` but it is not compatible with this Req

3. We backtrack

4. We find a new dependency `foo="<2.0.0-beta"`. If we are not careful we will decided that because `2.0.0-beta` < `5.0.0` and there are no versions in `"<5.0.0"` then there are no version in `"<2.0.0-beta"`

Do we know how dart pub handles such a situation?

@mpizenberg
Copy link
Member Author

Do we know how dart pub handles such a situation?

So I cloned pub and added the following test:

  test('backtracking to pre-release after NoVersions', () async {
    await servePackages((builder) {
      builder.serve('a', '1.1.0', deps: {'b': '^1.0.0'});
      builder.serve('b', '1.1.0-alpha');
      builder.serve('a', '1.0.0', deps: {'b': '^1.1.0-alpha'});
    });

    await d.appDir({
      'a': '^1.0.0',
    }).create();
    await expectResolves(result: {
      'a': '1.1.0',
      'b': '1.1.0-alpha',
    });
  });

And that test is passing. Meaning they accept b 1.1.0-alpha even though a 1.1.0 asked for b ^1.0.0 so without pre-release. I suppose they do that to never generate a NoVersion incompat if only viable versions are pre-releases? Or maybe it's for another reason. Might be worth asking them?

@mpizenberg
Copy link
Member Author

Hum, however, if I go one dependency deeper to force a backtracking to happen, it seems to make pub solver fail!

  test('backtracking to pre-release after NoVersions', () async {
    await servePackages((builder) {
      builder.serve('a', '1.1.0', deps: {'b': '^1.0.0'});
      builder.serve('b', '1.0.0', deps: {'c': '^1.0.0'});
      builder.serve('c', '0.9.0');
      builder.serve('b', '1.1.0-alpha');
      builder.serve('a', '1.0.0', deps: {'b': '^1.1.0-alpha'});
    });

    await d.appDir({
      'a': '^1.0.0',
    }).create();
    await expectResolves();
    // await expectResolves(result: {
    //   'a': '1.1.0',
    //   'b': '1.1.0-alpha',
    // });
  });

This test fails even though I'm just asking it expectResolves() which should pass if there is any solution. Which we know there are. Because a 1.0.0, b 1.1.0-alpha is a valid solution, or even a 1.1.0, b 1.1.0-alpha if it behaved like the other example above with one less dependency deep.

@mpizenberg
Copy link
Member Author

I've created a PR on pub repo with the failing test asking for their thoughts: dart-lang/pub#3038

@Eh2406
Copy link
Member

Eh2406 commented Jun 27, 2021

Do we know how dart pub handles such a situation?

I was going to do more research but I thought the answer was "by having more Range like rules around when pre releases match", looks like you did the research and found the real answer is "with bugs".

@mpizenberg mpizenberg changed the title Very experimental new range and version API Experimental new Interval API with continuous space for versions Jul 21, 2021
@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 6, 2021

Don't bother looking at those new commits. I'm trying stuff to figure out what is wrong with the intervals. For now, I've added logging, which lead me to discover that one assignment intersection was wrong at some point. Leading me to the fact that there is a problem with intersection since if I set

  • r1 to < 1 or > 1 (basically the complement of the singleton with just 1)
  • r2 to >= 1, < 2 (the range 1..2)

Then we have r1.intersection(r2) that is correct, returning > 1, < 2 but r2.intersection(r1) that is incorrect, returning >= 1, < 2.

Typically, this should have been spotted by the intersection_is_symmetric property test, but it's not, even if I increase very much the number of proptest cases. So I've been trying to improve the strategy for generation of ranges with bounds that may take all possible shapes, but without success for now.

@mpizenberg
Copy link
Member Author

I've temporarily deactivated the serde feature and all tests (except with serde) are passing :)

I didn't found a way to trigger the symmetric intersection tests though, so I just fixed the bug in intersection :(. I didn't know it could be possible to be unhappy about fixing a long standing bug but that's what I feel after failing to trigger it with property tests ahah.

Before evaluating how much of a performance drain is the usage of bounds, I thus need to fix all the code related to the serde feature, to be able to run the benchmarks. I'll probably also have to re-encode the ron files. I'm not sure how I'll do that but will see.

@mpizenberg
Copy link
Member Author

The code in this branch will also need a decent amount of cleaning. And while evaluating perf, verifying that the log:: usage does not impact performances.

@Eh2406
Copy link
Member

Eh2406 commented Aug 8, 2021

If there is a bug in intersection, let's make the fix a separate PR so we can get it out there ASAP.

@mpizenberg
Copy link
Member Author

mpizenberg commented Aug 8, 2021 via email

@mpizenberg
Copy link
Member Author

This has been superseded by #112

@mpizenberg mpizenberg closed this Jun 26, 2022
@Eh2406 Eh2406 deleted the ranges branch November 1, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants