Skip to content

feat: add inclusive and exclusive bounds to Range #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

baszalmstra
Copy link
Contributor

@baszalmstra baszalmstra commented Apr 22, 2022

This PR adds BoundedRange<T> (which is a terrible name), this is similar to a Range<T> but allows inclusive or exclusive bounds to be used. This enables T to be any type that implements Ord + Clone and doesn't require the Version trait.

This PR is to get a discussion going. I don't expect it to be merged in its current state. The type passes the same property tests as applied to Range. Personally, I would replace Range with this new type because it's much more flexible.

I'm curious about your thoughts!

Update 25/04/2022
I replaced the Range struct with a new implementation that provides inclusive or exclusive bounds. This enables expressing more complex ranges than previously possible.

@mpizenberg
Copy link
Member

Hi @baszalmstra at a glance, this looks similar to another attempt we discussed last year #99 (comment)
I haven't read in details your PR but let me know if you find similarities with the one I linked.

PS: I've been away for a long time now because of many things. But I'm happy that most of the things preventing me to get some time back for pubgrub are done :) I'm finalizing my website because I'm looking for a job now (will apply to few things next week) but meanwhile, I'll be able to do some pubgrub code and docs!

@baszalmstra
Copy link
Contributor Author

It is similar but this work builds upon the work in the version-set branch instead of dev. I think my PR is simpler because users dont need to implement anything extra to be able to use it in a Range.

Good to have you back! Ive been trying to use pubgrub with Conda packages (More info in versionset pull request). Would love to chat to you about it!

@Eh2406
Copy link
Member

Eh2406 commented Apr 23, 2022

If we are imagining this being released as part of version-set, then the more the merrier. Any reusable ranges we want to implement should be made available. Either as part this crate, or even as third party crates. We were able to determine in #99 that the straightforward extensions to this do not cover the needs of pre-release versions (for Cargo) so even if this is an implementation offered by this crate, we will still need a higher level extension point for more complicated cases.

@baszalmstra
Copy link
Contributor Author

@Eh2406 but do you think this could replace the existing Range<T> struct as well? It covers the same use cases and more. Although I'm not sure about the performance but that's something I think could easily be fixed.

@Eh2406
Copy link
Member

Eh2406 commented Apr 25, 2022

Yes. If a power user can pick the exact implementation they care about, then the default should probably be the more flexible option. Keeping both probably doesn't pull its weight.

@baszalmstra
Copy link
Contributor Author

Yes. If a power user can pick the exact implementation they care about, then the default should probably be the more flexible option. Keeping both probably doesn't pull its weight.

Cool, I'll update the PR to do so.

@baszalmstra
Copy link
Contributor Author

baszalmstra commented Apr 25, 2022

I updated the code to replace the Range struct. Without updating the rest of the code all tests succeed. The diff looks scary, but it basically replaces the entire file with new contents (except for the tests).

@baszalmstra baszalmstra changed the title feat: added BoundedRange feat: add inclusive and exclusive bounds to Range Apr 25, 2022
@baszalmstra
Copy link
Contributor Author

@Eh2406 @mpizenberg do you guys think you have time to take a look at this? :)

@mpizenberg
Copy link
Member

@baszalmstra yes definitely, just have been more time consuming than I hoped to search for a job ^^. I don't like always saying "soon", it makes me uncomfortable especially since I've been saying this for weeks to Jacob but I can't say anything else :(

@Eh2406
Copy link
Member

Eh2406 commented May 18, 2022

I've only had time for one small thought, I think this extension will work very nicely with the added API surface from #105 if that gets merged.

@baszalmstra
Copy link
Contributor Author

Indeed!

On a side note: could you enable the CI for this PR?

@baszalmstra baszalmstra force-pushed the feat/inclusive_range branch from d1ef004 to 849004e Compare May 21, 2022 08:11
@baszalmstra
Copy link
Contributor Author

baszalmstra commented May 21, 2022

I brought this branch upstream with the latest changes from version-set (which should include #105).

There is still a test failing large_case. This is because the serde format of Range changed (it includes Inclusive or Exclusive now). Does anyone still remember how the test set (large_case_u16_NumberVersion.ron) was generated? Can I redo that somehow?

@mpizenberg
Copy link
Member

@baszalmstra I'm working on a update with a summary of discussions about what's coming in the version-set branch. The gist is it's good but I have a lot to write for the summary so will do it after lunch. If you can bear with me a little bit longer and I'll be able to discuss more about your case (tags + bounds).

Regarding the large case the way I'd go is as follows. Write a small program that have both the old version and the new version of the code, under different naming (can be done in cargo.toml). Then have a small function able to convert the ranges types from old to new, and re-serialize with serde the new data structure.

@baszalmstra
Copy link
Contributor Author

Yeah of course, no worries at all.

The thing is that the new Range implementation supports more variants than the original Range type. I can simply port the data from one type to the other, but then you would miss the new possibilities. Therefore I was wondering how you generated them so I can do the same but with all cases in there.

But take your time! There is no rush on my end. 😸

@mpizenberg
Copy link
Member

It was @Eh2406 who found those cases and generated them (there are a couple more that are not in the repo) and those where basically generated randomly and kept for performance analysis since they took more time than the majority of others. The ron file are just the serialization (--features serde) of an OfflineDependencyProvider containing all the versions of packages for that particular resolution. Since they were randomly generated that's why I'd go the route of using old+new code and convert from old offline deps provider to new one and re-serialize.

The thing is that the new Range implementation supports more variants than the original Range type

Agreed! There is a performance tradeoff though, so in my opinion, the way forward will probably to provide both a "discrete" helper impl (the old Range) and a new "bounds + continuous" helper impl.

@Eh2406
Copy link
Member

Eh2406 commented May 21, 2022

We've had concerns in the past about having large benchmarking examples in the git history. What would be very nice is if the new code knew how to deserialize from both formats. That would make it very easy to do comparative benchmarking and see what the performance cost actually is, while also not adding another representation of the file in the history.

I'm not particularly worried about adding test files that use the new behavior. These test files are generated from our property based tests, which are still running to generate tests for the new behavior.

@baszalmstra
Copy link
Contributor Author

Mm, it isn't completely clear to me from the discussion in #108 and your last comment what should happen with this Range implementation. Can you clarify this @mpizenberg? Do you want to have both the "old" and the new ranges? Or should this implementation replace the previous one (that's what the PR currently does)?

I feel like since the new implementation can do everything the old one can and more with fewer requirements imposed on the type, replacing the old implementation makes sense to me. The code for both implementations is fairly similar. The performance overhead should be fairly small when optimized. There are some fast code paths in the old implementation that I didn't copy.

Let me know what you think!

@Eh2406
Copy link
Member

Eh2406 commented May 21, 2022

#108 (comment)

My feeling is that it will have a non-negligeable performance cost so this needs to be evaluated first.

The performance overhead should be fairly small when optimized.

Instead of arguing about what we think the performance will be, lets use the benchmarks and collect real data!

@baszalmstra
Copy link
Contributor Author

baszalmstra commented May 21, 2022

I modified the code to have both versions:

  • DiscreteRange: the original Range
  • BoundedRange: the new implementation

I also created a copy and manually edited the large_case to work with BoundedRange. Both are run as part of the large_case test and benchmark. The proptest test file contains an alias for SemVS and NumVS which allows one to easily toggle between both versions.

Here is the result from the benchmark:

Benchmarking large_cases/large_case_u16_bounded_NumberVersion.ron: Collecting 100 samples in estimated 20.467 s (1700 iterati
large_cases/large_case_u16_bounded_NumberVersion.ron
                        time:   [11.789 ms 11.992 ms 12.269 ms]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking large_cases/large_case_u16_discrete_NumberVersion.ron: Collecting 100 samples in estimated 20.919 s (2100 iterations)
large_cases/large_case_u16_discrete_NumberVersion.ron
                        time:   [9.9442 ms 9.9811 ms 10.020 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The BoundedRange (new implementation) seems to be about 10% slower than the original Range.

@Eh2406
Copy link
Member

Eh2406 commented May 24, 2022

It's a little bit of a mess but I created this deserializing code that can read either one:

#[cfg(feature = "serde")]
impl<'de, V: serde::Deserialize<'de>> serde::Deserialize<'de> for BoundedRange<V> {
    fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        #[derive(serde::Deserialize)]
        #[serde(untagged)]
        enum EitherInterval<V> {
            B(Bound<V>, Bound<V>),
            D(V, Option<V>),
        }
        // SmallVec<(V, Option<V>)>
        let foo: Vec<EitherInterval<V>> = serde::Deserialize::deserialize(deserializer)?;

        let mut segments = SmallVec::Empty;

        for i in foo {
            match i {
                EitherInterval::B(l, r) => segments.push((l, r)),
                EitherInterval::D(l, Some(r)) => segments.push((Included(l), Excluded(r))),
                EitherInterval::D(l, None) => segments.push((Included(l), Unbounded)),
            }
        }

        Ok(BoundedRange { segments })
    }
}

Then I used it in large_cases:

fn bench<'a, P: Package + Deserialize<'a>, VS: VersionSet + Deserialize<'a>>(
    b: &mut Bencher,
    case: &'a str,
) where
    VS::V: Deserialize<'a>,
{
    let dependency_provider: OfflineDependencyProvider<P, VS> = ron::de::from_str(&case).unwrap();

    b.iter(|| {
        for p in dependency_provider.packages() {
            for n in dependency_provider.versions(p).unwrap() {
                let _ = resolve(&dependency_provider, p.clone(), n.clone());
            }
        }
    });
}

fn bench_nested(c: &mut Criterion) {
    let mut group = c.benchmark_group("large_cases");
    group.measurement_time(Duration::from_secs(20));

    for case in std::fs::read_dir("test-examples").unwrap() {
        let case = case.unwrap().path();
        let name = case.file_name().unwrap().to_string_lossy();
        let data = std::fs::read_to_string(&case).unwrap();
        if name.ends_with("u16_NumberVersion.ron") {
            group.bench_function("Discrete_".to_string() + &name, |b| {
                bench::<u16, DiscreteRange<NumberVersion>>(b, &data);
            });
            group.bench_function("Bounded_".to_string() + &name, |b| {
                bench::<u16, BoundedRange<NumberVersion>>(b, &data);
            });
        } else if name.ends_with("str_SemanticVersion.ron") {
            group.bench_function("Discrete_".to_string() + &name, |b| {
                bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
            });
            group.bench_function("Bounded_".to_string() + &name, |b| {
                bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
            });
        }
    }

    group.finish();
}

criterion_group!(benches, bench_nested);
criterion_main!(benches);

and ran it on the zuse and the elm files ( output cleaned and sorted):

large_cases/Discrete_elm_str_SemanticVersion.ron
                        time:   [199.71 ms 199.81 ms 199.90 ms]
large_cases/Bounded_elm_str_SemanticVersion.ron
                        time:   [203.44 ms 203.77 ms 204.16 ms]
large_cases/Discrete_large_case_u16_NumberVersion.ron
                        time:   [7.7832 ms 7.7863 ms 7.7892 ms]
large_cases/Bounded_large_case_u16_NumberVersion.ron
                        time:   [9.1590 ms 9.1654 ms 9.1726 ms]
large_cases/Discrete_zuse_str_SemanticVersion.ron
                        time:   [274.55 ms 274.70 ms 274.85 ms]
large_cases/Bounded_zuse_str_SemanticVersion.ron
                        time:   [275.56 ms 275.69 ms 275.85 ms]

@mpizenberg
Copy link
Member

Awesome thanks @Eh2406 for always getting valuable data on benchmarks! So interpreting these results means that (at least on these benchmarks) when the version type gets more complex (like semantic versions) the storage difference for sets gets negligible.

So this means we should not consider performances as the bottleneck. The only useful consideration then is do we want to introduce this change that would mean bigger breaking changes for people updating.

Advantages:

  • A single more generic default implementation
  • A more convenient API with the use of std range bounds is possible
  • Continuous space support on the default impl makes people's life much easier to implement correct pre-release handling (otherwise getting the bump() right is a bit annoying) and makes the guide section on pre-release simpler and more focused.

Inconveniences:

  • More complex upgrade for users of v0.2. This can be mitigated by a good upgrade section in the guide.

@mpizenberg
Copy link
Member

Some more thoughts I had while browsing through the PR:

  • Renaming of functions to match the VersionSet trait names? like exact -> singleton?
  • Missing fast path for contains to return false if we have exceeded the value already
  • Is it possible to implement From<RB> with RB: RangeBounds? I seem to recall it was not possible last time I tried but maybe some new Rust features makes it possible?
  • bounds_as_ref -> bound_as_ref (no extra s)
  • Why add an impl of union for bounded range that is the same as the one automatically derived?
  • I haven't checked the intersection function, which is the most complex to implement right and efficiently. It should be correct if tests are passing. But just as a starting info to review it later, did you start from scratch, or does it follow the same structure as the previous one? How do you suggest I read it to check it?
  • Any other info regarding tests? Did you change or add anything?

@mpizenberg mpizenberg deleted the branch pubgrub-rs:version-set May 24, 2022 15:12
@mpizenberg mpizenberg closed this May 24, 2022
@mpizenberg
Copy link
Member

Hum sorry seems this was closed due to targeting the branch version-set that was deleted after it been merged. I can't change the target of this PR to be the dev branch. Is this something you can do @baszalmstra ?

@Eh2406
Copy link
Member

Eh2406 commented May 24, 2022

We may need to open a new PR to straighten out the branch issue.
It seems to me that it is worth us maintaining only one implementation, given that the performance overhead seems small for most use cases. The upgrading guide is already going to be complicated, we have a major new trait already and will probably have other significant changes. That guide should definitely mention this change in implementation, and it might make a good example for the value of the new trait that you can implement your own Set which gets the performance back if it matters for your case.

@baszalmstra
Copy link
Contributor Author

baszalmstra commented May 24, 2022

I will open a new PR, Ill clean up the branch as well.

I will replace the old Range implementation there, add updated large_case regression files and fix/implement all the issues @mpizenberg mentioned! Anything I missed?

Ill make sure its ready for a proper review.

@mpizenberg
Copy link
Member

Awesome thanks @baszalmstra !

@mpizenberg
Copy link
Member

PS, no need to change the RON file since we can use Jacobs decoding code to decode either format.

@mpizenberg
Copy link
Member

Hum, or should we ... I don't know, what do you guys think?

@baszalmstra
Copy link
Contributor Author

Hum, or should we ... I don't know, what do you guys think?

Id be inclined the use the code from @Eh2406 , that's one hurdle less when it comes to migrating.

@Eh2406
Copy link
Member

Eh2406 commented May 25, 2022

Yes, I think it would be convenient for upgrading and for our testing purposes to be able to read both.

@baszalmstra
Copy link
Contributor Author

Renaming of functions to match the VersionSet trait names? like exact -> singleton?

I left it the same as the original Range. Otherwise, this will break the old API. WDYT?

Is it possible to implement From with RB: RangeBounds? I seem to recall it was not possible last time I tried but maybe some new Rust features makes it possible?

Taking the same signature from the from_range_bounds method and turning that into a From impl results in:

Impl<V, IV: Clone + Into<V>, R: RangeBounds<IV>> From<R> for Range<V> { .. }

However, that wont compile because the IV type is unconstrained:

impl<V, IV: Clone + Into<V>, R: RangeBounds<IV>> From<R> for Range<V> {
        ^^ unconstrained type parameter

I could implement

impl<V: Clone, R: RangeBounds<V>> From<R> for Range<V>

but that also results in an error because downstream crates may implement trait 'std::ops::RangeBounds<_>' for type 'range::Range<_>'

Why add an impl of union for bounded range that is the same as the one automatically derived?

I implemented it there to make the type more "complete" so you don't have to use the Range trait to compute the union (same goes for intersection, but that does have a complicated implementation.) The intersection function is in the Range struct itself because its trait requirements are smaller than required by VersionSet. I think we have three options:

  1. Implement both union and or intersection on the Range<T> type for consistency.
  2. Only implement intersection via the VersionSet trait for Range<T>. So Range<T> doesn't implement intersection nor union directly.
  3. Only implement intersection for Range<T> like you propose.

WDYT?

I haven't checked the intersection function, which is the most complex to implement right and efficiently. It should be correct if tests are passing. But just as a starting info to review it later, did you start from scratch, or does it follow the same structure as the previous one? How do you suggest I read it to check it?

Yeah good question, it's indeed the biggest change. I tried separating the different cases and commenting them. I took a lot of inspiration from the original Range implementation. It does sortah follow the same structure, its just a bit more complex because you have to take into account the comparison between inclusive and exclusive bounds. The original implementation also had some early outs in there that I think can maybe also be added here. However, to keep things readable I didn't add them. I'm gonna think if I can make it easier to read. Feedback is welcome!

Any other info regarding tests? Did you change or add anything?

I didn't change the tests at all.

@mpizenberg
Copy link
Member

I left it the same as the original Range. Otherwise, this will break the old API. WDYT?

I personally would prefer the namings of the new trait instead of the old Range. The new names are more adequate to the set terminology and renaming things is one of simplest things one can have to do with other breaking changes anyway (there will be another breaking change with the dependency provider trait introduced too in 0.3)

@mpizenberg
Copy link
Member

Taking the same signature from the from_range_bounds method and turning that into a From impl results

Ok yeah, let's stick with the constructor functions then

@mpizenberg
Copy link
Member

I think we have three options

I see, you implemented everything with the least constraints possible. So how likely is it that people use the type in a context where do not also use the trait? I see that as very unlikely but I don't want to wrongly impose more constraints than necessary. This also impact how straight forward are the functions located in the code documentation. Maybe that's even the most important factor? Putting everything under the same impl block of the type so that functions are nicely located in the docs? I don't have strong opinions about this.

@baszalmstra
Copy link
Contributor Author

Putting everything under the same impl block of the type so that functions are nicely located in the docs?

Yeah, that might be a reason to have all of them implemented under the type itself.

I don't have strong opinions about this.

Yeah me neither. What do you think @Eh2406 ?

@mpizenberg
Copy link
Member

I tried separating the different cases and commenting them.

Ok so the intersection has the same structure overall but with some changes towards readability / dealing with bounds. I'm moving tomorrow for the long weekend here but I'll keep that in mind when reviewing it next week.

@Eh2406
Copy link
Member

Eh2406 commented May 25, 2022

I don't have a strong opinion either. I'm leaning toward what we have, with abounds as tight as possible.

@baszalmstra
Copy link
Contributor Author

Alrighty, here it is: #112

pub fn from_range_bounds<R, IV>(bounds: R) -> Self
where
R: RangeBounds<IV>,
IV: Clone + Into<V>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by why this required clone, I had worked hard to find a way to do this without the extra clone here. But this may be micro optimization on my part, clone is a much more common bound. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I changed that because most types don't implement From<&'a T> for T. I felt like that was a bit of a strange requirement to have, especially since you will most likely implement that with a .clone() anyway.

You did teach me about for<'a>! I had never seen that. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants