Skip to content

Commit d424128

Browse files
authored
Merge pull request #2351 from LukasKalbertodt/rfc-add-is-sorted
RFC: Add `is_sorted` to the standard library
2 parents 4b7ff2f + 1aa75ab commit d424128

File tree

1 file changed

+302
-0
lines changed

1 file changed

+302
-0
lines changed

text/2351-is-sorted.md

+302
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
- Feature Name: `is_sorted`
2+
- Start Date: 2018-02-24
3+
- RFC PR: [rust-lang/rfcs#2351](https://github.com/rust-lang/rfcs/pull/2351)
4+
- Rust Issue: [rust-lang/rust#53485](https://github.com/rust-lang/rust/issues/53485)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]`;
10+
add the methods `is_sorted` and `is_sorted_by` to `Iterator`.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
In quite a few situations, one needs to check whether a sequence of elements
16+
is sorted. The most important use cases are probably **unit tests** and
17+
**pre-/post-condition checks**.
18+
19+
The lack of an `is_sorted()` function in Rust's standard library has led to
20+
[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93).
21+
While it is possible to write a one-liner using iterators (e.g.
22+
`(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])`¹), it is still unnecessary
23+
mental overhead while writing *and* reading the code.
24+
25+
In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370)
26+
(from which a few comments are referenced) everyone seems to agree on the
27+
basic premise: we want such a function.
28+
29+
Having `is_sorted()` and friends in the standard library would:
30+
- prevent people from spending time on writing their own,
31+
- improve readbility of the code by clearly showing the author's intent,
32+
- and encourage to write more unit tests and/or pre-/post-condition checks.
33+
34+
Another proof of this functions' usefulness is the inclusion in the
35+
standard library of many other languages:
36+
C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted),
37+
Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted),
38+
D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html)
39+
and others. (Curiously, many (mostly) more high-level programming language –
40+
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.)
41+
42+
¹ In the initial version of this RFC, this code snippet contained a bug
43+
(`<` instead of `<=`). This subtle mistake happens very often: in this RFC,
44+
[in the discussion thread about this RFC](https://github.com/rust-lang/rfcs/pull/2351#issuecomment-370126518),
45+
in [this StackOverflow answer](https://stackoverflow.com/posts/51272639/revisions)
46+
and in many more places. Thus, avoiding this common bug is another good
47+
reason to add `is_sorted()`.
48+
49+
## Fast Implementation via SIMD
50+
51+
Lastly, it is possible to implement `is_sorted` for many common types with SIMD
52+
instructions which improves speed significantly. It is unlikely that many
53+
programmers will take the time to write SIMD code themselves, thus everyone
54+
would benefit if this rather difficult implementation work is done in the
55+
standard library.
56+
57+
58+
59+
# Guide-level explanation
60+
[guide-level-explanation]: #guide-level-explanation
61+
62+
Possible documentation of the two new methods of `Iterator` as well as
63+
`[T]::is_sorted_by_key`:
64+
65+
> ```rust
66+
> fn is_sorted(self) -> bool
67+
> where
68+
> Self::Item: PartialOrd,
69+
> ```
70+
> Checks if the elements of this iterator are sorted.
71+
>
72+
> That is, for each element `a` and its following element `b`, `a <= b`
73+
> must hold. If the iterator yields exactly zero or one element, `true`
74+
> is returned.
75+
>
76+
> Note that if `Self::Item` is only `PartialOrd`, but not `Ord`, the above
77+
> definition implies that this function returns `false` if any two
78+
> consecutive items are not comparable.
79+
>
80+
> ## Example
81+
>
82+
> ```rust
83+
> assert!([1, 2, 2, 9].iter().is_sorted());
84+
> assert!(![1, 3, 2, 4).iter().is_sorted());
85+
> assert!([0].iter().is_sorted());
86+
> assert!(std::iter::empty::<i32>().is_sorted());
87+
> assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted());
88+
> ```
89+
> ---
90+
>
91+
> ```rust
92+
> fn is_sorted_by<F>(self, compare: F) -> bool
93+
> where
94+
> F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,
95+
> ```
96+
> Checks if the elements of this iterator are sorted using the given
97+
> comparator function.
98+
>
99+
> Instead of using `PartialOrd::partial_cmp`, this function uses the given
100+
> `compare` function to determine the ordering of two elements. Apart from
101+
> that, it's equivalent to `is_sorted`; see its documentation for more
102+
> information.
103+
>
104+
> ---
105+
>
106+
> (*for `[T]`*)
107+
>
108+
> ```rust
109+
> fn is_sorted_by_key<F, K>(&self, f: F) -> bool
110+
> where
111+
> F: FnMut(&T) -> K,
112+
> K: PartialOrd,
113+
> ```
114+
> Checks if the elements of this slice are sorted using the given
115+
> key extraction function.
116+
>
117+
> Instead of comparing the slice's elements directly, this function
118+
> compares the keys of the elements, as determined by `f`. Apart from
119+
> that, it's equivalent to `is_sorted`; see its documentation for more
120+
> information.
121+
>
122+
> ## Example
123+
>
124+
> ```rust
125+
> assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len()));
126+
> assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs()));
127+
> ```
128+
129+
The methods `[T]::is_sorted` and `[T]::is_sorted_by` will have analogous
130+
documentations to the ones shown above.
131+
132+
# Reference-level explanation
133+
[reference-level-explanation]: #reference-level-explanation
134+
135+
This RFC proposes to add the following methods to `[T]` (slices) and
136+
`Iterator`:
137+
138+
```rust
139+
impl<T> [T] {
140+
fn is_sorted(&self) -> bool
141+
where
142+
T: PartialOrd,
143+
{ ... }
144+
145+
fn is_sorted_by<F>(&self, compare: F) -> bool
146+
where
147+
F: FnMut(&T, &T) -> Option<Ordering>,
148+
{ ... }
149+
150+
fn is_sorted_by_key<F, K>(&self, f: F) -> bool
151+
where
152+
F: FnMut(&T) -> K,
153+
K: PartialOrd,
154+
{ ... }
155+
}
156+
157+
trait Iterator {
158+
fn is_sorted(self) -> bool
159+
where
160+
Self::Item: PartialOrd,
161+
{ ... }
162+
163+
fn is_sorted_by<F>(mut self, compare: F) -> bool
164+
where
165+
F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,
166+
{ ... }
167+
}
168+
```
169+
170+
In addition to the changes shown above, the three methods added to `[T]` should
171+
also be added to `core::slice::SliceExt` as they don't require heap
172+
allocations.
173+
174+
To repeat the exact semantics from the prior section: the methods return
175+
`true` if and only if for each element `a` and its following element `b`, the
176+
condition `a <= b` holds. For slices/iterators with zero or one element,
177+
`true` is returned. For elements which implement `PartialOrd`, but not `Ord`,
178+
the function returns `false` if any two consecutive elements are not
179+
comparable (this is an implication of the `a <= b` condition from above).
180+
181+
A sample implementation can be found
182+
[here](https://play.rust-lang.org/?gist=431ff42fe8ba5980fcf9250c8bc4492b&version=stable).
183+
184+
185+
# Drawbacks
186+
[drawbacks]: #drawbacks
187+
188+
It increases the size of the standard library by a tiny bit.
189+
190+
# Rationale and alternatives
191+
[alternatives]: #alternatives
192+
193+
### Only add the methods to `Iterator`, but not to `[T]`
194+
Without `is_sorted()` defined for slices directly, one can still fairly easily
195+
test if a slice is sorted by obtaining an iterator via `iter()`. So instead of
196+
`v.is_sorted()`, one would need to write `v.iter().is_sorted()`.
197+
198+
This always works for `is_sorted()` because of the `PartialOrd` blanket impl
199+
which implements `PartialOrd` for all references to an `PartialOrd` type. For
200+
`is_sorted_by` it would introduce an additional reference to the closures'
201+
arguments (i.e. `v.iter().is_sorted_by(|a, b| ...))` where `a` and `b` are
202+
`&&T`).
203+
204+
While these two inconveniences are not deal-breakers, being able to call those
205+
three methods on slices (and all `Deref<Target=[T]>` types) directly, could be
206+
favourable for many programmers (especially given the popularity of slice-like
207+
data structures, like `Vec<T>`). Additionally, the `sort` method and friends
208+
are defined for slices, thus one might expect the `is_sorted()` method there,
209+
too.
210+
211+
212+
### Add the three methods to additional data structures (like `LinkedList`) as well
213+
Adding these methods to every data structure in the standard libary is a lot of
214+
duplicate code. Optimally, we would have a trait that represents sequential
215+
data structures and would only add `is_sorted` and friends to said trait. We
216+
don't have such a trait as of now; so `Iterator` is the next best thing. Slices
217+
deserve special treatment due to the reasons mentioned above (popularity and
218+
`sort()`).
219+
220+
221+
### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ...
222+
[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370),
223+
concerns about completely consuming the iterator were raised. Some alternatives,
224+
such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139),
225+
were suggested. However, consuming the iterator is neither uncommon nor a
226+
problem. Methods like `count()`, `max()` and many more consume the iterator,
227+
too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions:
228+
229+
> I am a bit skeptical of the equivalent on Iterator just because the return
230+
> value does not seem actionable -- you aren't going to "sort" the iterator
231+
> after you find out it is not already sorted. What are some use cases for this
232+
> in real code that does not involve iterating over a slice?
233+
234+
As mentioned above, `Iterator` is the next best thing to a trait representing
235+
sequential data structures. So to check if a `LinkedList`, `VecDeque` or
236+
another sequential data structure is sorted, one would simply call
237+
`collection.iter().is_sorted()`. It's likely that this is the main usage for
238+
`Iterator`'s `is_sorted` methods. Additionally, code like
239+
`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in
240+
O(n) for already sorted arrays.
241+
242+
Suggestions like `is_sorted_until` are not really useful either: one can easily
243+
get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()`
244+
on that part.
245+
246+
247+
# Unresolved questions
248+
[unresolved]: #unresolved-questions
249+
250+
251+
### Should `Iterator::is_sorted_by_key` be added as well?
252+
253+
This RFC proposes to add `is_sorted_by_key` only to `[T]` but not to
254+
`Iterator`. The latter addition wouldn't be too useful since once could easily
255+
achieve the same effect as `.is_sorted_by_key(...)` by calling
256+
`.map(...).is_sorted()`. It might still be favourable to include said function
257+
for consistency and ease of use. The standard library already hosts a number of
258+
sorting-related functions all of which come in three flavours: *raw*, `_by` and
259+
`_by_key`. By now, programmers could expect there to be an `is_sorted_by_key`
260+
as well.
261+
262+
263+
### Add `std::cmp::is_sorted` instead
264+
265+
As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831),
266+
one could also add this free function (plus the `_by` and `_by_key` versions)
267+
to `std::cmp`:
268+
269+
```rust
270+
fn is_sorted<C>(collection: C) -> bool
271+
where
272+
C: IntoIterator,
273+
C::Item: Ord,
274+
```
275+
276+
This can be seen as a better design as it avoids the question about which data
277+
structure should get `is_sorted` methods. However, it might have the
278+
disadvantage of being less discoverable and also less convenient (long path or
279+
import).
280+
281+
282+
### Require `Ord` instead of only `PartialOrd`
283+
284+
As proposed in this RFC, `is_sorted` only requires its elements to be
285+
`PartialOrd`. If two non-comparable elements are encountered, `false` is
286+
returned. This is probably the only useful way to define the function for
287+
partially orderable elements.
288+
289+
While it's convenient to call `is_sorted()` on slices containing only
290+
partially orderable elements (like floats), we might want to use the stronger
291+
`Ord` bound:
292+
293+
- Firstly, for most programmers it's probably not *immediately* clear how the
294+
function is defined for partially ordered elements (the documentation should
295+
be sufficient as explanation, though).
296+
- Secondly, being able to call `is_sorted` on something will probably make
297+
most programmers think, that calling `sort` on the same thing is possible,
298+
too. Having different bounds for `is_sorted` and `sort` thus might lead to
299+
confusion.
300+
- Lastly, the `is_sorted_by` function currently uses a closure which returns
301+
`Option<Ordering>`. This differs from the closure for `sort_by` and looks a
302+
bit more complicated than necessary for most cases.

0 commit comments

Comments
 (0)