diff --git a/text/2351-is-sorted.md b/text/2351-is-sorted.md new file mode 100644 index 00000000000..03be680ca5f --- /dev/null +++ b/text/2351-is-sorted.md @@ -0,0 +1,302 @@ +- Feature Name: `is_sorted` +- Start Date: 2018-02-24 +- RFC PR: [rust-lang/rfcs#2351](https://github.com/rust-lang/rfcs/pull/2351) +- Rust Issue: [rust-lang/rust#53485](https://github.com/rust-lang/rust/issues/53485) + +# Summary +[summary]: #summary + +Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]`; +add the methods `is_sorted` and `is_sorted_by` to `Iterator`. + +# Motivation +[motivation]: #motivation + +In quite a few situations, one needs to check whether a sequence of elements +is sorted. The most important use cases are probably **unit tests** and +**pre-/post-condition checks**. + +The lack of an `is_sorted()` function in Rust's standard library has led to +[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93). +While it is possible to write a one-liner using iterators (e.g. +`(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])`¹), it is still unnecessary +mental overhead while writing *and* reading the code. + +In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370) +(from which a few comments are referenced) everyone seems to agree on the +basic premise: we want such a function. + +Having `is_sorted()` and friends in the standard library would: +- prevent people from spending time on writing their own, +- improve readbility of the code by clearly showing the author's intent, +- and encourage to write more unit tests and/or pre-/post-condition checks. + +Another proof of this functions' usefulness is the inclusion in the +standard library of many other languages: +C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted), +Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted), +D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html) +and others. (Curiously, many (mostly) more high-level programming language – +like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.) + +¹ In the initial version of this RFC, this code snippet contained a bug +(`<` instead of `<=`). This subtle mistake happens very often: in this RFC, +[in the discussion thread about this RFC](https://github.com/rust-lang/rfcs/pull/2351#issuecomment-370126518), +in [this StackOverflow answer](https://stackoverflow.com/posts/51272639/revisions) +and in many more places. Thus, avoiding this common bug is another good +reason to add `is_sorted()`. + +## Fast Implementation via SIMD + +Lastly, it is possible to implement `is_sorted` for many common types with SIMD +instructions which improves speed significantly. It is unlikely that many +programmers will take the time to write SIMD code themselves, thus everyone +would benefit if this rather difficult implementation work is done in the +standard library. + + + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Possible documentation of the two new methods of `Iterator` as well as +`[T]::is_sorted_by_key`: + +> ```rust +> fn is_sorted(self) -> bool +> where +> Self::Item: PartialOrd, +> ``` +> Checks if the elements of this iterator are sorted. +> +> That is, for each element `a` and its following element `b`, `a <= b` +> must hold. If the iterator yields exactly zero or one element, `true` +> is returned. +> +> Note that if `Self::Item` is only `PartialOrd`, but not `Ord`, the above +> definition implies that this function returns `false` if any two +> consecutive items are not comparable. +> +> ## Example +> +> ```rust +> assert!([1, 2, 2, 9].iter().is_sorted()); +> assert!(![1, 3, 2, 4).iter().is_sorted()); +> assert!([0].iter().is_sorted()); +> assert!(std::iter::empty::().is_sorted()); +> assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted()); +> ``` +> --- +> +> ```rust +> fn is_sorted_by(self, compare: F) -> bool +> where +> F: FnMut(&Self::Item, &Self::Item) -> Option, +> ``` +> Checks if the elements of this iterator are sorted using the given +> comparator function. +> +> Instead of using `PartialOrd::partial_cmp`, this function uses the given +> `compare` function to determine the ordering of two elements. Apart from +> that, it's equivalent to `is_sorted`; see its documentation for more +> information. +> +> --- +> +> (*for `[T]`*) +> +> ```rust +> fn is_sorted_by_key(&self, f: F) -> bool +> where +> F: FnMut(&T) -> K, +> K: PartialOrd, +> ``` +> Checks if the elements of this slice are sorted using the given +> key extraction function. +> +> Instead of comparing the slice's elements directly, this function +> compares the keys of the elements, as determined by `f`. Apart from +> that, it's equivalent to `is_sorted`; see its documentation for more +> information. +> +> ## Example +> +> ```rust +> assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len())); +> assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs())); +> ``` + +The methods `[T]::is_sorted` and `[T]::is_sorted_by` will have analogous +documentations to the ones shown above. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +This RFC proposes to add the following methods to `[T]` (slices) and +`Iterator`: + +```rust +impl [T] { + fn is_sorted(&self) -> bool + where + T: PartialOrd, + { ... } + + fn is_sorted_by(&self, compare: F) -> bool + where + F: FnMut(&T, &T) -> Option, + { ... } + + fn is_sorted_by_key(&self, f: F) -> bool + where + F: FnMut(&T) -> K, + K: PartialOrd, + { ... } +} + +trait Iterator { + fn is_sorted(self) -> bool + where + Self::Item: PartialOrd, + { ... } + + fn is_sorted_by(mut self, compare: F) -> bool + where + F: FnMut(&Self::Item, &Self::Item) -> Option, + { ... } +} +``` + +In addition to the changes shown above, the three methods added to `[T]` should +also be added to `core::slice::SliceExt` as they don't require heap +allocations. + +To repeat the exact semantics from the prior section: the methods return +`true` if and only if for each element `a` and its following element `b`, the +condition `a <= b` holds. For slices/iterators with zero or one element, +`true` is returned. For elements which implement `PartialOrd`, but not `Ord`, +the function returns `false` if any two consecutive elements are not +comparable (this is an implication of the `a <= b` condition from above). + +A sample implementation can be found +[here](https://play.rust-lang.org/?gist=431ff42fe8ba5980fcf9250c8bc4492b&version=stable). + + +# Drawbacks +[drawbacks]: #drawbacks + +It increases the size of the standard library by a tiny bit. + +# Rationale and alternatives +[alternatives]: #alternatives + +### Only add the methods to `Iterator`, but not to `[T]` +Without `is_sorted()` defined for slices directly, one can still fairly easily +test if a slice is sorted by obtaining an iterator via `iter()`. So instead of +`v.is_sorted()`, one would need to write `v.iter().is_sorted()`. + +This always works for `is_sorted()` because of the `PartialOrd` blanket impl +which implements `PartialOrd` for all references to an `PartialOrd` type. For +`is_sorted_by` it would introduce an additional reference to the closures' +arguments (i.e. `v.iter().is_sorted_by(|a, b| ...))` where `a` and `b` are +`&&T`). + +While these two inconveniences are not deal-breakers, being able to call those +three methods on slices (and all `Deref` types) directly, could be +favourable for many programmers (especially given the popularity of slice-like +data structures, like `Vec`). Additionally, the `sort` method and friends +are defined for slices, thus one might expect the `is_sorted()` method there, +too. + + +### Add the three methods to additional data structures (like `LinkedList`) as well +Adding these methods to every data structure in the standard libary is a lot of +duplicate code. Optimally, we would have a trait that represents sequential +data structures and would only add `is_sorted` and friends to said trait. We +don't have such a trait as of now; so `Iterator` is the next best thing. Slices +deserve special treatment due to the reasons mentioned above (popularity and +`sort()`). + + +### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ... +[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370), +concerns about completely consuming the iterator were raised. Some alternatives, +such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139), +were suggested. However, consuming the iterator is neither uncommon nor a +problem. Methods like `count()`, `max()` and many more consume the iterator, +too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions: + +> I am a bit skeptical of the equivalent on Iterator just because the return +> value does not seem actionable -- you aren't going to "sort" the iterator +> after you find out it is not already sorted. What are some use cases for this +> in real code that does not involve iterating over a slice? + +As mentioned above, `Iterator` is the next best thing to a trait representing +sequential data structures. So to check if a `LinkedList`, `VecDeque` or +another sequential data structure is sorted, one would simply call +`collection.iter().is_sorted()`. It's likely that this is the main usage for +`Iterator`'s `is_sorted` methods. Additionally, code like +`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in +O(n) for already sorted arrays. + +Suggestions like `is_sorted_until` are not really useful either: one can easily +get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()` +on that part. + + +# Unresolved questions +[unresolved]: #unresolved-questions + + +### Should `Iterator::is_sorted_by_key` be added as well? + +This RFC proposes to add `is_sorted_by_key` only to `[T]` but not to +`Iterator`. The latter addition wouldn't be too useful since once could easily +achieve the same effect as `.is_sorted_by_key(...)` by calling +`.map(...).is_sorted()`. It might still be favourable to include said function +for consistency and ease of use. The standard library already hosts a number of +sorting-related functions all of which come in three flavours: *raw*, `_by` and +`_by_key`. By now, programmers could expect there to be an `is_sorted_by_key` +as well. + + +### Add `std::cmp::is_sorted` instead + +As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831), +one could also add this free function (plus the `_by` and `_by_key` versions) +to `std::cmp`: + +```rust +fn is_sorted(collection: C) -> bool +where + C: IntoIterator, + C::Item: Ord, +``` + +This can be seen as a better design as it avoids the question about which data +structure should get `is_sorted` methods. However, it might have the +disadvantage of being less discoverable and also less convenient (long path or +import). + + +### Require `Ord` instead of only `PartialOrd` + +As proposed in this RFC, `is_sorted` only requires its elements to be +`PartialOrd`. If two non-comparable elements are encountered, `false` is +returned. This is probably the only useful way to define the function for +partially orderable elements. + +While it's convenient to call `is_sorted()` on slices containing only +partially orderable elements (like floats), we might want to use the stronger +`Ord` bound: + +- Firstly, for most programmers it's probably not *immediately* clear how the + function is defined for partially ordered elements (the documentation should + be sufficient as explanation, though). +- Secondly, being able to call `is_sorted` on something will probably make + most programmers think, that calling `sort` on the same thing is possible, + too. Having different bounds for `is_sorted` and `sort` thus might lead to + confusion. +- Lastly, the `is_sorted_by` function currently uses a closure which returns + `Option`. This differs from the closure for `sort_by` and looks a + bit more complicated than necessary for most cases.