|
| 1 | +- Feature Name: `is_sorted` |
| 2 | +- Start Date: 2018-02-24 |
| 3 | +- RFC PR: [rust-lang/rfcs#2351](https://github.com/rust-lang/rfcs/pull/2351) |
| 4 | +- Rust Issue: [rust-lang/rust#53485](https://github.com/rust-lang/rust/issues/53485) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Add the methods `is_sorted`, `is_sorted_by` and `is_sorted_by_key` to `[T]`; |
| 10 | +add the methods `is_sorted` and `is_sorted_by` to `Iterator`. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +In quite a few situations, one needs to check whether a sequence of elements |
| 16 | +is sorted. The most important use cases are probably **unit tests** and |
| 17 | +**pre-/post-condition checks**. |
| 18 | + |
| 19 | +The lack of an `is_sorted()` function in Rust's standard library has led to |
| 20 | +[countless programmers implementing their own](https://github.com/search?l=Rust&q=%22fn+is_sorted%22&type=Code&utf8=%E2%9C%93). |
| 21 | +While it is possible to write a one-liner using iterators (e.g. |
| 22 | +`(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])`¹), it is still unnecessary |
| 23 | +mental overhead while writing *and* reading the code. |
| 24 | + |
| 25 | +In [the corresponding issue on the main repository](https://github.com/rust-lang/rust/issues/44370) |
| 26 | +(from which a few comments are referenced) everyone seems to agree on the |
| 27 | +basic premise: we want such a function. |
| 28 | + |
| 29 | +Having `is_sorted()` and friends in the standard library would: |
| 30 | +- prevent people from spending time on writing their own, |
| 31 | +- improve readbility of the code by clearly showing the author's intent, |
| 32 | +- and encourage to write more unit tests and/or pre-/post-condition checks. |
| 33 | + |
| 34 | +Another proof of this functions' usefulness is the inclusion in the |
| 35 | +standard library of many other languages: |
| 36 | +C++'s [`std::is_sorted`](http://en.cppreference.com/w/cpp/algorithm/is_sorted), |
| 37 | +Go's [`sort.IsSorted`](https://golang.org/pkg/sort/#IsSorted), |
| 38 | +D's [`std.algorithm.sorting.is_sorted`](https://dlang.org/library/std/algorithm/sorting/is_sorted.html) |
| 39 | +and others. (Curiously, many (mostly) more high-level programming language – |
| 40 | +like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.) |
| 41 | + |
| 42 | +¹ In the initial version of this RFC, this code snippet contained a bug |
| 43 | +(`<` instead of `<=`). This subtle mistake happens very often: in this RFC, |
| 44 | +[in the discussion thread about this RFC](https://github.com/rust-lang/rfcs/pull/2351#issuecomment-370126518), |
| 45 | +in [this StackOverflow answer](https://stackoverflow.com/posts/51272639/revisions) |
| 46 | +and in many more places. Thus, avoiding this common bug is another good |
| 47 | +reason to add `is_sorted()`. |
| 48 | + |
| 49 | +## Fast Implementation via SIMD |
| 50 | + |
| 51 | +Lastly, it is possible to implement `is_sorted` for many common types with SIMD |
| 52 | +instructions which improves speed significantly. It is unlikely that many |
| 53 | +programmers will take the time to write SIMD code themselves, thus everyone |
| 54 | +would benefit if this rather difficult implementation work is done in the |
| 55 | +standard library. |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +# Guide-level explanation |
| 60 | +[guide-level-explanation]: #guide-level-explanation |
| 61 | + |
| 62 | +Possible documentation of the two new methods of `Iterator` as well as |
| 63 | +`[T]::is_sorted_by_key`: |
| 64 | + |
| 65 | +> ```rust |
| 66 | +> fn is_sorted(self) -> bool |
| 67 | +> where |
| 68 | +> Self::Item: PartialOrd, |
| 69 | +> ``` |
| 70 | +> Checks if the elements of this iterator are sorted. |
| 71 | +> |
| 72 | +> That is, for each element `a` and its following element `b`, `a <= b` |
| 73 | +> must hold. If the iterator yields exactly zero or one element, `true` |
| 74 | +> is returned. |
| 75 | +> |
| 76 | +> Note that if `Self::Item` is only `PartialOrd`, but not `Ord`, the above |
| 77 | +> definition implies that this function returns `false` if any two |
| 78 | +> consecutive items are not comparable. |
| 79 | +> |
| 80 | +> ## Example |
| 81 | +> |
| 82 | +> ```rust |
| 83 | +> assert!([1, 2, 2, 9].iter().is_sorted()); |
| 84 | +> assert!(![1, 3, 2, 4).iter().is_sorted()); |
| 85 | +> assert!([0].iter().is_sorted()); |
| 86 | +> assert!(std::iter::empty::<i32>().is_sorted()); |
| 87 | +> assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted()); |
| 88 | +> ``` |
| 89 | +> --- |
| 90 | +> |
| 91 | +> ```rust |
| 92 | +> fn is_sorted_by<F>(self, compare: F) -> bool |
| 93 | +> where |
| 94 | +> F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, |
| 95 | +> ``` |
| 96 | +> Checks if the elements of this iterator are sorted using the given |
| 97 | +> comparator function. |
| 98 | +> |
| 99 | +> Instead of using `PartialOrd::partial_cmp`, this function uses the given |
| 100 | +> `compare` function to determine the ordering of two elements. Apart from |
| 101 | +> that, it's equivalent to `is_sorted`; see its documentation for more |
| 102 | +> information. |
| 103 | +> |
| 104 | +> --- |
| 105 | +> |
| 106 | +> (*for `[T]`*) |
| 107 | +> |
| 108 | +> ```rust |
| 109 | +> fn is_sorted_by_key<F, K>(&self, f: F) -> bool |
| 110 | +> where |
| 111 | +> F: FnMut(&T) -> K, |
| 112 | +> K: PartialOrd, |
| 113 | +> ``` |
| 114 | +> Checks if the elements of this slice are sorted using the given |
| 115 | +> key extraction function. |
| 116 | +> |
| 117 | +> Instead of comparing the slice's elements directly, this function |
| 118 | +> compares the keys of the elements, as determined by `f`. Apart from |
| 119 | +> that, it's equivalent to `is_sorted`; see its documentation for more |
| 120 | +> information. |
| 121 | +> |
| 122 | +> ## Example |
| 123 | +> |
| 124 | +> ```rust |
| 125 | +> assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len())); |
| 126 | +> assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs())); |
| 127 | +> ``` |
| 128 | +
|
| 129 | +The methods `[T]::is_sorted` and `[T]::is_sorted_by` will have analogous |
| 130 | +documentations to the ones shown above. |
| 131 | +
|
| 132 | +# Reference-level explanation |
| 133 | +[reference-level-explanation]: #reference-level-explanation |
| 134 | +
|
| 135 | +This RFC proposes to add the following methods to `[T]` (slices) and |
| 136 | +`Iterator`: |
| 137 | +
|
| 138 | +```rust |
| 139 | +impl<T> [T] { |
| 140 | + fn is_sorted(&self) -> bool |
| 141 | + where |
| 142 | + T: PartialOrd, |
| 143 | + { ... } |
| 144 | +
|
| 145 | + fn is_sorted_by<F>(&self, compare: F) -> bool |
| 146 | + where |
| 147 | + F: FnMut(&T, &T) -> Option<Ordering>, |
| 148 | + { ... } |
| 149 | +
|
| 150 | + fn is_sorted_by_key<F, K>(&self, f: F) -> bool |
| 151 | + where |
| 152 | + F: FnMut(&T) -> K, |
| 153 | + K: PartialOrd, |
| 154 | + { ... } |
| 155 | +} |
| 156 | +
|
| 157 | +trait Iterator { |
| 158 | + fn is_sorted(self) -> bool |
| 159 | + where |
| 160 | + Self::Item: PartialOrd, |
| 161 | + { ... } |
| 162 | +
|
| 163 | + fn is_sorted_by<F>(mut self, compare: F) -> bool |
| 164 | + where |
| 165 | + F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>, |
| 166 | + { ... } |
| 167 | +} |
| 168 | +``` |
| 169 | +
|
| 170 | +In addition to the changes shown above, the three methods added to `[T]` should |
| 171 | +also be added to `core::slice::SliceExt` as they don't require heap |
| 172 | +allocations. |
| 173 | + |
| 174 | +To repeat the exact semantics from the prior section: the methods return |
| 175 | +`true` if and only if for each element `a` and its following element `b`, the |
| 176 | +condition `a <= b` holds. For slices/iterators with zero or one element, |
| 177 | +`true` is returned. For elements which implement `PartialOrd`, but not `Ord`, |
| 178 | +the function returns `false` if any two consecutive elements are not |
| 179 | +comparable (this is an implication of the `a <= b` condition from above). |
| 180 | + |
| 181 | +A sample implementation can be found |
| 182 | +[here](https://play.rust-lang.org/?gist=431ff42fe8ba5980fcf9250c8bc4492b&version=stable). |
| 183 | + |
| 184 | + |
| 185 | +# Drawbacks |
| 186 | +[drawbacks]: #drawbacks |
| 187 | + |
| 188 | +It increases the size of the standard library by a tiny bit. |
| 189 | + |
| 190 | +# Rationale and alternatives |
| 191 | +[alternatives]: #alternatives |
| 192 | + |
| 193 | +### Only add the methods to `Iterator`, but not to `[T]` |
| 194 | +Without `is_sorted()` defined for slices directly, one can still fairly easily |
| 195 | +test if a slice is sorted by obtaining an iterator via `iter()`. So instead of |
| 196 | +`v.is_sorted()`, one would need to write `v.iter().is_sorted()`. |
| 197 | + |
| 198 | +This always works for `is_sorted()` because of the `PartialOrd` blanket impl |
| 199 | +which implements `PartialOrd` for all references to an `PartialOrd` type. For |
| 200 | +`is_sorted_by` it would introduce an additional reference to the closures' |
| 201 | +arguments (i.e. `v.iter().is_sorted_by(|a, b| ...))` where `a` and `b` are |
| 202 | +`&&T`). |
| 203 | + |
| 204 | +While these two inconveniences are not deal-breakers, being able to call those |
| 205 | +three methods on slices (and all `Deref<Target=[T]>` types) directly, could be |
| 206 | +favourable for many programmers (especially given the popularity of slice-like |
| 207 | +data structures, like `Vec<T>`). Additionally, the `sort` method and friends |
| 208 | +are defined for slices, thus one might expect the `is_sorted()` method there, |
| 209 | +too. |
| 210 | + |
| 211 | + |
| 212 | +### Add the three methods to additional data structures (like `LinkedList`) as well |
| 213 | +Adding these methods to every data structure in the standard libary is a lot of |
| 214 | +duplicate code. Optimally, we would have a trait that represents sequential |
| 215 | +data structures and would only add `is_sorted` and friends to said trait. We |
| 216 | +don't have such a trait as of now; so `Iterator` is the next best thing. Slices |
| 217 | +deserve special treatment due to the reasons mentioned above (popularity and |
| 218 | +`sort()`). |
| 219 | + |
| 220 | + |
| 221 | +### `Iterator::while_sorted`, `is_sorted_until`, `sorted_prefix`, `num_sorted`, ... |
| 222 | +[In the issue on the main repository](https://github.com/rust-lang/rust/issues/44370), |
| 223 | +concerns about completely consuming the iterator were raised. Some alternatives, |
| 224 | +such as [`while_sorted`](https://github.com/rust-lang/rust/issues/44370#issuecomment-327873139), |
| 225 | +were suggested. However, consuming the iterator is neither uncommon nor a |
| 226 | +problem. Methods like `count()`, `max()` and many more consume the iterator, |
| 227 | +too. [One comment](https://github.com/rust-lang/rust/issues/44370#issuecomment-344516366) mentions: |
| 228 | + |
| 229 | +> I am a bit skeptical of the equivalent on Iterator just because the return |
| 230 | +> value does not seem actionable -- you aren't going to "sort" the iterator |
| 231 | +> after you find out it is not already sorted. What are some use cases for this |
| 232 | +> in real code that does not involve iterating over a slice? |
| 233 | +
|
| 234 | +As mentioned above, `Iterator` is the next best thing to a trait representing |
| 235 | +sequential data structures. So to check if a `LinkedList`, `VecDeque` or |
| 236 | +another sequential data structure is sorted, one would simply call |
| 237 | +`collection.iter().is_sorted()`. It's likely that this is the main usage for |
| 238 | +`Iterator`'s `is_sorted` methods. Additionally, code like |
| 239 | +`if v.is_sorted() { v.sort(); }` is not very useful: `sort()` already runs in |
| 240 | +O(n) for already sorted arrays. |
| 241 | + |
| 242 | +Suggestions like `is_sorted_until` are not really useful either: one can easily |
| 243 | +get a subslice or a part of an iterator (via `.take()`) and call `is_sorted()` |
| 244 | +on that part. |
| 245 | + |
| 246 | + |
| 247 | +# Unresolved questions |
| 248 | +[unresolved]: #unresolved-questions |
| 249 | + |
| 250 | + |
| 251 | +### Should `Iterator::is_sorted_by_key` be added as well? |
| 252 | + |
| 253 | +This RFC proposes to add `is_sorted_by_key` only to `[T]` but not to |
| 254 | +`Iterator`. The latter addition wouldn't be too useful since once could easily |
| 255 | +achieve the same effect as `.is_sorted_by_key(...)` by calling |
| 256 | +`.map(...).is_sorted()`. It might still be favourable to include said function |
| 257 | +for consistency and ease of use. The standard library already hosts a number of |
| 258 | +sorting-related functions all of which come in three flavours: *raw*, `_by` and |
| 259 | +`_by_key`. By now, programmers could expect there to be an `is_sorted_by_key` |
| 260 | +as well. |
| 261 | + |
| 262 | + |
| 263 | +### Add `std::cmp::is_sorted` instead |
| 264 | + |
| 265 | +As suggested [here](https://github.com/rust-lang/rust/issues/44370#issuecomment-345495831), |
| 266 | +one could also add this free function (plus the `_by` and `_by_key` versions) |
| 267 | +to `std::cmp`: |
| 268 | + |
| 269 | +```rust |
| 270 | +fn is_sorted<C>(collection: C) -> bool |
| 271 | +where |
| 272 | + C: IntoIterator, |
| 273 | + C::Item: Ord, |
| 274 | +``` |
| 275 | + |
| 276 | +This can be seen as a better design as it avoids the question about which data |
| 277 | +structure should get `is_sorted` methods. However, it might have the |
| 278 | +disadvantage of being less discoverable and also less convenient (long path or |
| 279 | +import). |
| 280 | + |
| 281 | + |
| 282 | +### Require `Ord` instead of only `PartialOrd` |
| 283 | + |
| 284 | +As proposed in this RFC, `is_sorted` only requires its elements to be |
| 285 | +`PartialOrd`. If two non-comparable elements are encountered, `false` is |
| 286 | +returned. This is probably the only useful way to define the function for |
| 287 | +partially orderable elements. |
| 288 | + |
| 289 | +While it's convenient to call `is_sorted()` on slices containing only |
| 290 | +partially orderable elements (like floats), we might want to use the stronger |
| 291 | +`Ord` bound: |
| 292 | + |
| 293 | +- Firstly, for most programmers it's probably not *immediately* clear how the |
| 294 | + function is defined for partially ordered elements (the documentation should |
| 295 | + be sufficient as explanation, though). |
| 296 | +- Secondly, being able to call `is_sorted` on something will probably make |
| 297 | + most programmers think, that calling `sort` on the same thing is possible, |
| 298 | + too. Having different bounds for `is_sorted` and `sort` thus might lead to |
| 299 | + confusion. |
| 300 | +- Lastly, the `is_sorted_by` function currently uses a closure which returns |
| 301 | + `Option<Ordering>`. This differs from the closure for `sort_by` and looks a |
| 302 | + bit more complicated than necessary for most cases. |
0 commit comments