-
Notifications
You must be signed in to change notification settings - Fork 39
feat: add inclusive and exclusive bounds to Range #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add inclusive and exclusive bounds to Range #111
Conversation
Hi @baszalmstra at a glance, this looks similar to another attempt we discussed last year #99 (comment) PS: I've been away for a long time now because of many things. But I'm happy that most of the things preventing me to get some time back for pubgrub are done :) I'm finalizing my website because I'm looking for a job now (will apply to few things next week) but meanwhile, I'll be able to do some pubgrub code and docs! |
It is similar but this work builds upon the work in the version-set branch instead of dev. I think my PR is simpler because users dont need to implement anything extra to be able to use it in a Range. Good to have you back! Ive been trying to use pubgrub with Conda packages (More info in versionset pull request). Would love to chat to you about it! |
If we are imagining this being released as part of |
@Eh2406 but do you think this could replace the existing |
Yes. If a power user can pick the exact implementation they care about, then the default should probably be the more flexible option. Keeping both probably doesn't pull its weight. |
Cool, I'll update the PR to do so. |
I updated the code to replace the |
@Eh2406 @mpizenberg do you guys think you have time to take a look at this? :) |
@baszalmstra yes definitely, just have been more time consuming than I hoped to search for a job ^^. I don't like always saying "soon", it makes me uncomfortable especially since I've been saying this for weeks to Jacob but I can't say anything else :( |
I've only had time for one small thought, I think this extension will work very nicely with the added API surface from #105 if that gets merged. |
Indeed! On a side note: could you enable the CI for this PR? |
d1ef004
to
849004e
Compare
I brought this branch upstream with the latest changes from There is still a test failing |
@baszalmstra I'm working on a update with a summary of discussions about what's coming in the version-set branch. The gist is it's good but I have a lot to write for the summary so will do it after lunch. If you can bear with me a little bit longer and I'll be able to discuss more about your case (tags + bounds). Regarding the large case the way I'd go is as follows. Write a small program that have both the old version and the new version of the code, under different naming (can be done in cargo.toml). Then have a small function able to convert the ranges types from old to new, and re-serialize with serde the new data structure. |
Yeah of course, no worries at all. The thing is that the new But take your time! There is no rush on my end. 😸 |
It was @Eh2406 who found those cases and generated them (there are a couple more that are not in the repo) and those where basically generated randomly and kept for performance analysis since they took more time than the majority of others. The ron file are just the serialization (
Agreed! There is a performance tradeoff though, so in my opinion, the way forward will probably to provide both a "discrete" helper impl (the old Range) and a new "bounds + continuous" helper impl. |
We've had concerns in the past about having large benchmarking examples in the git history. What would be very nice is if the new code knew how to deserialize from both formats. That would make it very easy to do comparative benchmarking and see what the performance cost actually is, while also not adding another representation of the file in the history. I'm not particularly worried about adding test files that use the new behavior. These test files are generated from our property based tests, which are still running to generate tests for the new behavior. |
Mm, it isn't completely clear to me from the discussion in #108 and your last comment what should happen with this I feel like since the new implementation can do everything the old one can and more with fewer requirements imposed on the type, replacing the old implementation makes sense to me. The code for both implementations is fairly similar. The performance overhead should be fairly small when optimized. There are some fast code paths in the old implementation that I didn't copy. Let me know what you think! |
Instead of arguing about what we think the performance will be, lets use the benchmarks and collect real data! |
I modified the code to have both versions:
I also created a copy and manually edited the Here is the result from the benchmark:
The |
It's a little bit of a mess but I created this deserializing code that can read either one: #[cfg(feature = "serde")]
impl<'de, V: serde::Deserialize<'de>> serde::Deserialize<'de> for BoundedRange<V> {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
#[derive(serde::Deserialize)]
#[serde(untagged)]
enum EitherInterval<V> {
B(Bound<V>, Bound<V>),
D(V, Option<V>),
}
// SmallVec<(V, Option<V>)>
let foo: Vec<EitherInterval<V>> = serde::Deserialize::deserialize(deserializer)?;
let mut segments = SmallVec::Empty;
for i in foo {
match i {
EitherInterval::B(l, r) => segments.push((l, r)),
EitherInterval::D(l, Some(r)) => segments.push((Included(l), Excluded(r))),
EitherInterval::D(l, None) => segments.push((Included(l), Unbounded)),
}
}
Ok(BoundedRange { segments })
}
} Then I used it in large_cases: fn bench<'a, P: Package + Deserialize<'a>, VS: VersionSet + Deserialize<'a>>(
b: &mut Bencher,
case: &'a str,
) where
VS::V: Deserialize<'a>,
{
let dependency_provider: OfflineDependencyProvider<P, VS> = ron::de::from_str(&case).unwrap();
b.iter(|| {
for p in dependency_provider.packages() {
for n in dependency_provider.versions(p).unwrap() {
let _ = resolve(&dependency_provider, p.clone(), n.clone());
}
}
});
}
fn bench_nested(c: &mut Criterion) {
let mut group = c.benchmark_group("large_cases");
group.measurement_time(Duration::from_secs(20));
for case in std::fs::read_dir("test-examples").unwrap() {
let case = case.unwrap().path();
let name = case.file_name().unwrap().to_string_lossy();
let data = std::fs::read_to_string(&case).unwrap();
if name.ends_with("u16_NumberVersion.ron") {
group.bench_function("Discrete_".to_string() + &name, |b| {
bench::<u16, DiscreteRange<NumberVersion>>(b, &data);
});
group.bench_function("Bounded_".to_string() + &name, |b| {
bench::<u16, BoundedRange<NumberVersion>>(b, &data);
});
} else if name.ends_with("str_SemanticVersion.ron") {
group.bench_function("Discrete_".to_string() + &name, |b| {
bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
});
group.bench_function("Bounded_".to_string() + &name, |b| {
bench::<&str, DiscreteRange<SemanticVersion>>(b, &data);
});
}
}
group.finish();
}
criterion_group!(benches, bench_nested);
criterion_main!(benches); and ran it on the
|
Awesome thanks @Eh2406 for always getting valuable data on benchmarks! So interpreting these results means that (at least on these benchmarks) when the version type gets more complex (like semantic versions) the storage difference for sets gets negligible. So this means we should not consider performances as the bottleneck. The only useful consideration then is do we want to introduce this change that would mean bigger breaking changes for people updating. Advantages:
Inconveniences:
|
Some more thoughts I had while browsing through the PR:
|
Hum sorry seems this was closed due to targeting the branch version-set that was deleted after it been merged. I can't change the target of this PR to be the dev branch. Is this something you can do @baszalmstra ? |
We may need to open a new PR to straighten out the branch issue. |
I will open a new PR, Ill clean up the branch as well. I will replace the old Range implementation there, add updated large_case regression files and fix/implement all the issues @mpizenberg mentioned! Anything I missed? Ill make sure its ready for a proper review. |
Awesome thanks @baszalmstra ! |
PS, no need to change the RON file since we can use Jacobs decoding code to decode either format. |
Hum, or should we ... I don't know, what do you guys think? |
Id be inclined the use the code from @Eh2406 , that's one hurdle less when it comes to migrating. |
Yes, I think it would be convenient for upgrading and for our testing purposes to be able to read both. |
I left it the same as the original Range. Otherwise, this will break the old API. WDYT?
Taking the same signature from the Impl<V, IV: Clone + Into<V>, R: RangeBounds<IV>> From<R> for Range<V> { .. } However, that wont compile because the
I could implement
but that also results in an error because
I implemented it there to make the type more "complete" so you don't have to use the Range trait to compute the union (same goes for intersection, but that does have a complicated implementation.) The
WDYT?
Yeah good question, it's indeed the biggest change. I tried separating the different cases and commenting them. I took a lot of inspiration from the original Range implementation. It does sortah follow the same structure, its just a bit more complex because you have to take into account the comparison between inclusive and exclusive bounds. The original implementation also had some early outs in there that I think can maybe also be added here. However, to keep things readable I didn't add them. I'm gonna think if I can make it easier to read. Feedback is welcome!
I didn't change the tests at all. |
I personally would prefer the namings of the new trait instead of the old Range. The new names are more adequate to the set terminology and renaming things is one of simplest things one can have to do with other breaking changes anyway (there will be another breaking change with the dependency provider trait introduced too in 0.3) |
Ok yeah, let's stick with the constructor functions then |
I see, you implemented everything with the least constraints possible. So how likely is it that people use the type in a context where do not also use the trait? I see that as very unlikely but I don't want to wrongly impose more constraints than necessary. This also impact how straight forward are the functions located in the code documentation. Maybe that's even the most important factor? Putting everything under the same impl block of the type so that functions are nicely located in the docs? I don't have strong opinions about this. |
Yeah, that might be a reason to have all of them implemented under the type itself.
Yeah me neither. What do you think @Eh2406 ? |
Ok so the intersection has the same structure overall but with some changes towards readability / dealing with bounds. I'm moving tomorrow for the long weekend here but I'll keep that in mind when reviewing it next week. |
I don't have a strong opinion either. I'm leaning toward what we have, with abounds as tight as possible. |
Alrighty, here it is: #112 |
pub fn from_range_bounds<R, IV>(bounds: R) -> Self | ||
where | ||
R: RangeBounds<IV>, | ||
IV: Clone + Into<V>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused by why this required clone, I had worked hard to find a way to do this without the extra clone here. But this may be micro optimization on my part, clone
is a much more common bound. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I changed that because most types don't implement From<&'a T> for T
. I felt like that was a bit of a strange requirement to have, especially since you will most likely implement that with a .clone()
anyway.
You did teach me about for<'a>
! I had never seen that. 😄
This PR adds
BoundedRange<T>
(which is a terrible name), this is similar to aRange<T>
but allows inclusive or exclusive bounds to be used. This enablesT
to be any type that implementsOrd + Clone
and doesn't require theVersion
trait.This PR is to get a discussion going. I don't expect it to be merged in its current state. The type passes the same property tests as applied to
Range
. Personally, I would replaceRange
with this new type because it's much more flexible.I'm curious about your thoughts!
Update 25/04/2022
I replaced the
Range
struct with a new implementation that provides inclusive or exclusive bounds. This enables expressing more complex ranges than previously possible.