feat: add inclusive and exclusive bounds to Range #111

baszalmstra · 2022-04-22T19:18:45Z

This PR adds BoundedRange<T> (which is a terrible name), this is similar to a Range<T> but allows inclusive or exclusive bounds to be used. This enables T to be any type that implements Ord + Clone and doesn't require the Version trait.

This PR is to get a discussion going. I don't expect it to be merged in its current state. The type passes the same property tests as applied to Range. Personally, I would replace Range with this new type because it's much more flexible.

I'm curious about your thoughts!

Update 25/04/2022
I replaced the Range struct with a new implementation that provides inclusive or exclusive bounds. This enables expressing more complex ranges than previously possible.

mpizenberg · 2022-04-22T20:12:58Z

Hi @baszalmstra at a glance, this looks similar to another attempt we discussed last year #99 (comment)
I haven't read in details your PR but let me know if you find similarities with the one I linked.

PS: I've been away for a long time now because of many things. But I'm happy that most of the things preventing me to get some time back for pubgrub are done :) I'm finalizing my website because I'm looking for a job now (will apply to few things next week) but meanwhile, I'll be able to do some pubgrub code and docs!

baszalmstra · 2022-04-23T07:32:24Z

It is similar but this work builds upon the work in the version-set branch instead of dev. I think my PR is simpler because users dont need to implement anything extra to be able to use it in a Range.

Good to have you back! Ive been trying to use pubgrub with Conda packages (More info in versionset pull request). Would love to chat to you about it!

Eh2406 · 2022-04-23T23:42:34Z

If we are imagining this being released as part of version-set, then the more the merrier. Any reusable ranges we want to implement should be made available. Either as part this crate, or even as third party crates. We were able to determine in #99 that the straightforward extensions to this do not cover the needs of pre-release versions (for Cargo) so even if this is an implementation offered by this crate, we will still need a higher level extension point for more complicated cases.

baszalmstra · 2022-04-24T10:45:09Z

@Eh2406 but do you think this could replace the existing Range<T> struct as well? It covers the same use cases and more. Although I'm not sure about the performance but that's something I think could easily be fixed.

Eh2406 · 2022-04-25T00:31:50Z

Yes. If a power user can pick the exact implementation they care about, then the default should probably be the more flexible option. Keeping both probably doesn't pull its weight.

baszalmstra · 2022-04-25T07:05:20Z

Yes. If a power user can pick the exact implementation they care about, then the default should probably be the more flexible option. Keeping both probably doesn't pull its weight.

Cool, I'll update the PR to do so.

baszalmstra · 2022-04-25T19:31:43Z

I updated the code to replace the Range struct. Without updating the rest of the code all tests succeed. The diff looks scary, but it basically replaces the entire file with new contents (except for the tests).

baszalmstra · 2022-05-09T14:08:42Z

@Eh2406 @mpizenberg do you guys think you have time to take a look at this? :)

mpizenberg · 2022-05-09T14:12:35Z

@baszalmstra yes definitely, just have been more time consuming than I hoped to search for a job ^^. I don't like always saying "soon", it makes me uncomfortable especially since I've been saying this for weeks to Jacob but I can't say anything else :(

Eh2406 · 2022-05-18T15:31:34Z

I've only had time for one small thought, I think this extension will work very nicely with the added API surface from #105 if that gets merged.

baszalmstra · 2022-05-18T16:25:04Z

Indeed!

On a side note: could you enable the CI for this PR?

baszalmstra · 2022-05-21T08:18:07Z

I brought this branch upstream with the latest changes from version-set (which should include #105).

There is still a test failing large_case. This is because the serde format of Range changed (it includes Inclusive or Exclusive now). Does anyone still remember how the test set (large_case_u16_NumberVersion.ron) was generated? Can I redo that somehow?

mpizenberg · 2022-05-21T10:51:44Z

@baszalmstra I'm working on a update with a summary of discussions about what's coming in the version-set branch. The gist is it's good but I have a lot to write for the summary so will do it after lunch. If you can bear with me a little bit longer and I'll be able to discuss more about your case (tags + bounds).

Regarding the large case the way I'd go is as follows. Write a small program that have both the old version and the new version of the code, under different naming (can be done in cargo.toml). Then have a small function able to convert the ranges types from old to new, and re-serialize with serde the new data structure.

baszalmstra · 2022-05-21T11:12:17Z

Yeah of course, no worries at all.

The thing is that the new Range implementation supports more variants than the original Range type. I can simply port the data from one type to the other, but then you would miss the new possibilities. Therefore I was wondering how you generated them so I can do the same but with all cases in there.

But take your time! There is no rush on my end. 😸

mpizenberg · 2022-05-21T11:33:02Z

It was @Eh2406 who found those cases and generated them (there are a couple more that are not in the repo) and those where basically generated randomly and kept for performance analysis since they took more time than the majority of others. The ron file are just the serialization (--features serde) of an OfflineDependencyProvider containing all the versions of packages for that particular resolution. Since they were randomly generated that's why I'd go the route of using old+new code and convert from old offline deps provider to new one and re-serialize.

The thing is that the new Range implementation supports more variants than the original Range type

Agreed! There is a performance tradeoff though, so in my opinion, the way forward will probably to provide both a "discrete" helper impl (the old Range) and a new "bounds + continuous" helper impl.

Eh2406 · 2022-05-21T19:48:34Z

We've had concerns in the past about having large benchmarking examples in the git history. What would be very nice is if the new code knew how to deserialize from both formats. That would make it very easy to do comparative benchmarking and see what the performance cost actually is, while also not adding another representation of the file in the history.

I'm not particularly worried about adding test files that use the new behavior. These test files are generated from our property based tests, which are still running to generate tests for the new behavior.

baszalmstra · 2022-05-21T20:30:17Z

Mm, it isn't completely clear to me from the discussion in #108 and your last comment what should happen with this Range implementation. Can you clarify this @mpizenberg? Do you want to have both the "old" and the new ranges? Or should this implementation replace the previous one (that's what the PR currently does)?

I feel like since the new implementation can do everything the old one can and more with fewer requirements imposed on the type, replacing the old implementation makes sense to me. The code for both implementations is fairly similar. The performance overhead should be fairly small when optimized. There are some fast code paths in the old implementation that I didn't copy.

Let me know what you think!

Eh2406 · 2022-05-21T20:36:45Z

#108 (comment)

My feeling is that it will have a non-negligeable performance cost so this needs to be evaluated first.

The performance overhead should be fairly small when optimized.

Instead of arguing about what we think the performance will be, lets use the benchmarks and collect real data!

baszalmstra · 2022-05-21T21:09:35Z

I modified the code to have both versions:

DiscreteRange: the original Range
BoundedRange: the new implementation

I also created a copy and manually edited the large_case to work with BoundedRange. Both are run as part of the large_case test and benchmark. The proptest test file contains an alias for SemVS and NumVS which allows one to easily toggle between both versions.

Here is the result from the benchmark:

Benchmarking large_cases/large_case_u16_bounded_NumberVersion.ron: Collecting 100 samples in estimated 20.467 s (1700 iterati
large_cases/large_case_u16_bounded_NumberVersion.ron
                        time:   [11.789 ms 11.992 ms 12.269 ms]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking large_cases/large_case_u16_discrete_NumberVersion.ron: Collecting 100 samples in estimated 20.919 s (2100 iterations)
large_cases/large_case_u16_discrete_NumberVersion.ron
                        time:   [9.9442 ms 9.9811 ms 10.020 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The BoundedRange (new implementation) seems to be about 10% slower than the original Range.

Eh2406 · 2022-05-24T03:40:59Z

It's a little bit of a mess but I created this deserializing code that can read either one:

#[cfg(feature = "serde")]
impl<'de, V: serde::Deserialize<'de>> serde::Deserialize<'de> for BoundedRange<V> {
    fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        #[derive(serde::Deserialize)]
        #[serde(untagged)]
        enum EitherInterval<V> {
            B(Bound<V>, Bound<V>),
            D(V, Option<V>),
        }
        // SmallVec<(V, Option<V>)>
        let foo: Vec<EitherInterval<V>> = serde::Deserialize::deserialize(deserializer)?;

        let mut segments = SmallVec::Empty;

        for i in foo {
            match i {
                EitherInterval::B(l, r) => segments.push((l, r)),
                EitherInterval::D(l, Some(r)) => segments.push((Included(l), Excluded(r))),
                EitherInterval::D(l, None) => segments.push((Included(l), Unbounded)),
            }
        }

        Ok(BoundedRange { segments })
    }
}

Then I used it in large_cases:

fn bench<'a, P: Package + Deserialize<'a>, VS: VersionSet + Deserialize<'a>>(
    b: &mut Bencher,
    case: &'a str,
) where
    VS::V: Deserialize<'a>,
{
    let dependency_provider: OfflineDependencyProvider<P, VS> = ron::de::from_str(&case).unwrap();

    b.iter(|| {
        for p in dependency_provider.packages() {
            for n in dependency_provider.versions(p).unwrap() {
                let _ = resolve(&dependency_provider, p.clone(), n.clone());
            }
        }
    });
}

fn bench_nested(c: &mut Criterion) {
    let mut group = c.benchmark_group("large_cases");
    group.measurement_time(Duration::from_secs(20));

    for case in std::fs::read_dir("test-examples").unwrap() {
        let case = case.unwrap().path();
        let name = case.file_name().unwrap().to_string_lossy();
        let data = std::fs::read_to_string(&case).unwrap();
        if name.ends_with("u16_NumberVersion.ron") {
            group.bench_function("Discrete_".to_string() + &name, |b| {
                bench::<u16, DiscreteRange<NumberVersion>>(b, &data);
            });
            group.bench_function("Bounded_".to_string() + &name, |b| {
                bench::<u16, BoundedRange<NumberVersion>>(b, &data);
            });
        } else if name.ends_with("str_SemanticVersion.ron") {
            group.bench_function("Discrete_".to_string() + &name, |b| {
                bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
            });
            group.bench_function("Bounded_".to_string() + &name, |b| {
                bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
            });
        }
    }

    group.finish();
}

criterion_group!(benches, bench_nested);
criterion_main!(benches);

and ran it on the zuse and the elm files ( output cleaned and sorted):

large_cases/Discrete_elm_str_SemanticVersion.ron
                        time:   [199.71 ms 199.81 ms 199.90 ms]
large_cases/Bounded_elm_str_SemanticVersion.ron
                        time:   [203.44 ms 203.77 ms 204.16 ms]
large_cases/Discrete_large_case_u16_NumberVersion.ron
                        time:   [7.7832 ms 7.7863 ms 7.7892 ms]
large_cases/Bounded_large_case_u16_NumberVersion.ron
                        time:   [9.1590 ms 9.1654 ms 9.1726 ms]
large_cases/Discrete_zuse_str_SemanticVersion.ron
                        time:   [274.55 ms 274.70 ms 274.85 ms]
large_cases/Bounded_zuse_str_SemanticVersion.ron
                        time:   [275.56 ms 275.69 ms 275.85 ms]

mpizenberg · 2022-05-24T09:18:25Z

Awesome thanks @Eh2406 for always getting valuable data on benchmarks! So interpreting these results means that (at least on these benchmarks) when the version type gets more complex (like semantic versions) the storage difference for sets gets negligible.

So this means we should not consider performances as the bottleneck. The only useful consideration then is do we want to introduce this change that would mean bigger breaking changes for people updating.

Advantages:

A single more generic default implementation
A more convenient API with the use of std range bounds is possible
Continuous space support on the default impl makes people's life much easier to implement correct pre-release handling (otherwise getting the bump() right is a bit annoying) and makes the guide section on pre-release simpler and more focused.

Inconveniences:

More complex upgrade for users of v0.2. This can be mitigated by a good upgrade section in the guide.

mpizenberg · 2022-05-24T10:17:16Z

Some more thoughts I had while browsing through the PR:

Renaming of functions to match the VersionSet trait names? like exact -> singleton?
Missing fast path for contains to return false if we have exceeded the value already
Is it possible to implement From<RB> with RB: RangeBounds? I seem to recall it was not possible last time I tried but maybe some new Rust features makes it possible?
bounds_as_ref -> bound_as_ref (no extra s)
Why add an impl of union for bounded range that is the same as the one automatically derived?
I haven't checked the intersection function, which is the most complex to implement right and efficiently. It should be correct if tests are passing. But just as a starting info to review it later, did you start from scratch, or does it follow the same structure as the previous one? How do you suggest I read it to check it?
Any other info regarding tests? Did you change or add anything?

mpizenberg · 2022-05-24T15:22:07Z

Hum sorry seems this was closed due to targeting the branch version-set that was deleted after it been merged. I can't change the target of this PR to be the dev branch. Is this something you can do @baszalmstra ?

Eh2406 · 2022-05-24T15:29:04Z

We may need to open a new PR to straighten out the branch issue.
It seems to me that it is worth us maintaining only one implementation, given that the performance overhead seems small for most use cases. The upgrading guide is already going to be complicated, we have a major new trait already and will probably have other significant changes. That guide should definitely mention this change in implementation, and it might make a good example for the value of the new trait that you can implement your own Set which gets the performance back if it matters for your case.

baszalmstra · 2022-05-24T19:18:43Z

I will open a new PR, Ill clean up the branch as well.

I will replace the old Range implementation there, add updated large_case regression files and fix/implement all the issues @mpizenberg mentioned! Anything I missed?

Ill make sure its ready for a proper review.

mpizenberg · 2022-05-25T07:09:40Z

Awesome thanks @baszalmstra !

mpizenberg · 2022-05-25T14:21:40Z

PS, no need to change the RON file since we can use Jacobs decoding code to decode either format.

mpizenberg · 2022-05-25T14:24:52Z

Hum, or should we ... I don't know, what do you guys think?

baszalmstra · 2022-05-25T14:41:11Z

Hum, or should we ... I don't know, what do you guys think?

Id be inclined the use the code from @Eh2406 , that's one hurdle less when it comes to migrating.

Eh2406 · 2022-05-25T15:01:32Z

Yes, I think it would be convenient for upgrading and for our testing purposes to be able to read both.

baszalmstra · 2022-05-25T15:17:54Z

Renaming of functions to match the VersionSet trait names? like exact -> singleton?

I left it the same as the original Range. Otherwise, this will break the old API. WDYT?

Is it possible to implement From with RB: RangeBounds? I seem to recall it was not possible last time I tried but maybe some new Rust features makes it possible?

Taking the same signature from the from_range_bounds method and turning that into a From impl results in:

Impl<V, IV: Clone + Into<V>, R: RangeBounds<IV>> From<R> for Range<V> { .. }

However, that wont compile because the IV type is unconstrained:

impl<V, IV: Clone + Into<V>, R: RangeBounds<IV>> From<R> for Range<V> {
        ^^ unconstrained type parameter

I could implement

impl<V: Clone, R: RangeBounds<V>> From<R> for Range<V>

but that also results in an error because downstream crates may implement trait 'std::ops::RangeBounds<_>' for type 'range::Range<_>'

Why add an impl of union for bounded range that is the same as the one automatically derived?

I implemented it there to make the type more "complete" so you don't have to use the Range trait to compute the union (same goes for intersection, but that does have a complicated implementation.) The intersection function is in the Range struct itself because its trait requirements are smaller than required by VersionSet. I think we have three options:

Implement both union and or intersection on the Range<T> type for consistency.
Only implement intersection via the VersionSet trait for Range<T>. So Range<T> doesn't implement intersection nor union directly.
Only implement intersection for Range<T> like you propose.

WDYT?

I haven't checked the intersection function, which is the most complex to implement right and efficiently. It should be correct if tests are passing. But just as a starting info to review it later, did you start from scratch, or does it follow the same structure as the previous one? How do you suggest I read it to check it?

Yeah good question, it's indeed the biggest change. I tried separating the different cases and commenting them. I took a lot of inspiration from the original Range implementation. It does sortah follow the same structure, its just a bit more complex because you have to take into account the comparison between inclusive and exclusive bounds. The original implementation also had some early outs in there that I think can maybe also be added here. However, to keep things readable I didn't add them. I'm gonna think if I can make it easier to read. Feedback is welcome!

Any other info regarding tests? Did you change or add anything?

I didn't change the tests at all.

mpizenberg · 2022-05-25T15:25:08Z

I left it the same as the original Range. Otherwise, this will break the old API. WDYT?

I personally would prefer the namings of the new trait instead of the old Range. The new names are more adequate to the set terminology and renaming things is one of simplest things one can have to do with other breaking changes anyway (there will be another breaking change with the dependency provider trait introduced too in 0.3)

mpizenberg · 2022-05-25T15:28:01Z

Taking the same signature from the from_range_bounds method and turning that into a From impl results

Ok yeah, let's stick with the constructor functions then

mpizenberg · 2022-05-25T15:34:14Z

I think we have three options

I see, you implemented everything with the least constraints possible. So how likely is it that people use the type in a context where do not also use the trait? I see that as very unlikely but I don't want to wrongly impose more constraints than necessary. This also impact how straight forward are the functions located in the code documentation. Maybe that's even the most important factor? Putting everything under the same impl block of the type so that functions are nicely located in the docs? I don't have strong opinions about this.

baszalmstra · 2022-05-25T15:37:18Z

Putting everything under the same impl block of the type so that functions are nicely located in the docs?

Yeah, that might be a reason to have all of them implemented under the type itself.

I don't have strong opinions about this.

Yeah me neither. What do you think @Eh2406 ?

mpizenberg · 2022-05-25T15:37:37Z

I tried separating the different cases and commenting them.

Ok so the intersection has the same structure overall but with some changes towards readability / dealing with bounds. I'm moving tomorrow for the long weekend here but I'll keep that in mind when reviewing it next week.

Eh2406 · 2022-05-25T15:48:30Z

I don't have a strong opinion either. I'm leaning toward what we have, with abounds as tight as possible.

baszalmstra · 2022-05-25T16:07:18Z

Alrighty, here it is: #112

Eh2406 · 2022-05-25T16:08:04Z

src/bounded_range.rs

+    pub fn from_range_bounds<R, IV>(bounds: R) -> Self
+    where
+        R: RangeBounds<IV>,
+        IV: Clone + Into<V>,


I was confused by why this required clone, I had worked hard to find a way to do this without the extra clone here. But this may be micro optimization on my part, clone is a much more common bound. Thoughts?

Ah yes, I changed that because most types don't implement From<&'a T> for T. I felt like that was a bit of a strange requirement to have, especially since you will most likely implement that with a .clone() anyway.

You did teach me about for<'a>! I had never seen that. 😄

baszalmstra mentioned this pull request Apr 22, 2022

Use a VersionSet trait instead of Range #108

Merged

baszalmstra changed the title ~~feat: added BoundedRange~~ feat: add inclusive and exclusive bounds to Range Apr 25, 2022

refactor: add inclusive and exclusive bounds to Range

849004e

baszalmstra force-pushed the feat/inclusive_range branch from d1ef004 to 849004e Compare May 21, 2022 08:11

fix: formatting and doc comment

d3e0c22

baszalmstra added 2 commits May 21, 2022 22:56

chore: add both DiscreteRange and BoundedRange

e342162

chore: fixed and added benchmark for bounded and discrete ranges

366dec4

mpizenberg deleted the branch pubgrub-rs:version-set May 24, 2022 15:12

mpizenberg closed this May 24, 2022

baszalmstra mentioned this pull request May 25, 2022

refactor: replace Range with a bounded implementation #112

Merged

Eh2406 reviewed May 25, 2022

View reviewed changes

feat: add inclusive and exclusive bounds to Range #111

feat: add inclusive and exclusive bounds to Range #111

Conversation

baszalmstra commented Apr 22, 2022 • edited Loading

mpizenberg commented Apr 22, 2022

baszalmstra commented Apr 23, 2022

Eh2406 commented Apr 23, 2022

baszalmstra commented Apr 24, 2022

Eh2406 commented Apr 25, 2022

baszalmstra commented Apr 25, 2022

baszalmstra commented Apr 25, 2022 • edited Loading

baszalmstra commented May 9, 2022

mpizenberg commented May 9, 2022

Eh2406 commented May 18, 2022

baszalmstra commented May 18, 2022

baszalmstra commented May 21, 2022 • edited Loading

mpizenberg commented May 21, 2022

baszalmstra commented May 21, 2022

mpizenberg commented May 21, 2022

Eh2406 commented May 21, 2022

baszalmstra commented May 21, 2022

Eh2406 commented May 21, 2022

baszalmstra commented May 21, 2022 • edited Loading

Eh2406 commented May 24, 2022

mpizenberg commented May 24, 2022

mpizenberg commented May 24, 2022

mpizenberg commented May 24, 2022

Eh2406 commented May 24, 2022

baszalmstra commented May 24, 2022 • edited Loading

mpizenberg commented May 25, 2022

mpizenberg commented May 25, 2022

mpizenberg commented May 25, 2022

baszalmstra commented May 25, 2022

Eh2406 commented May 25, 2022

baszalmstra commented May 25, 2022

mpizenberg commented May 25, 2022

mpizenberg commented May 25, 2022

mpizenberg commented May 25, 2022

baszalmstra commented May 25, 2022

mpizenberg commented May 25, 2022

Eh2406 commented May 25, 2022

baszalmstra commented May 25, 2022

Eh2406 May 25, 2022

Choose a reason for hiding this comment

baszalmstra May 25, 2022

Choose a reason for hiding this comment

baszalmstra commented Apr 22, 2022 •

edited

Loading

baszalmstra commented Apr 25, 2022 •

edited

Loading

baszalmstra commented May 21, 2022 •

edited

Loading

baszalmstra commented May 21, 2022 •

edited

Loading

baszalmstra commented May 24, 2022 •

edited

Loading