@@ -350,22 +350,39 @@ should move backward by 2 bytes before continuing. This ensure searching for `\u
350
350
351
351
## Pattern API
352
352
353
- This RFC assumes a generalized pattern API which supports more than strings. If the pattern API is
354
- not available, the new functions can take `&OsStr` instead of `impl Pattern<&OsStr>`, but this may
355
- hurt future compatibility due to inference breakage.
353
+ As of Rust 1.25, we can search a `&str` using a character, a character set or another string,
354
+ powered by [RFC #528](https://github.com/rust-lang/rfcs/pull/528) a.k.a. “Pattern API 1.0”.
356
355
357
- Assuming we do want to generalize the Pattern API, the implementor should note the issue of
358
- splitting a surrogate pair:
356
+ There are some drafts to generalize this so that we could retain mutability and search in more types
357
+ such as `&[T]` and `&OsStr`, as described in various comments
358
+ (“[v1.5](https://github.com/rust-lang/rust/issues/27721#issuecomment-185405392)” and
359
+ “[v2.0](https://github.com/rust-lang/rfcs/pull/1309#issuecomment-214030263)”). A proper RFC has not
360
+ been proposed so far.
361
+
362
+ This RFC assumes the target of generalizing the Pattern API beyond `&str` is accepted, enabling us
363
+ to provide a uniform search API between different types of haystack and needles. However, this RFC
364
+ does not rely on a generalized Pattern API. If this RFC is stabilized without a generalized Pattern
365
+ API, the new methods described in the [Guide-level explanation][guide-level-explanation] section can
366
+ take `&OsStr` instead of `impl Pattern<&OsStr>`, but this may hurt future compatibility due to
367
+ inference breakage if generalized Pattern API is indeed implemented.
368
+
369
+ Assuming we do want to generalize Pattern API, the implementor should note the issue of splitting a
370
+ surrogate pair:
359
371
360
372
1. A match which starts with a low surrogate will point to byte 1 of the 4-byte sequence
361
373
2. An index always point to byte 2 of the 4-byte sequence
362
374
3. A match which ends with a high surrogate will point to byte 3 of the 4-byte sequence
363
375
364
376
Implementation should note these different offsets when converting between different kinds of
365
377
cursors. In the [`omgwtf8::pattern` module](https://docs.rs/omgwtf8/*/omgwtf8/pattern/index.html),
366
- this behavior is enforced by using distinct types for the start and end cursors.
378
+ based on the “v1.5” draft, this behavior is enforced in the API design by using distinct types for
379
+ the start and end cursors.
380
+
381
+ The following outlines the generalized Pattern API which could work for `&OsStr`:
367
382
368
383
```rust
384
+ // in module `core::pattern`:
385
+
369
386
pub trait Pattern<H: Haystack>: Sized {
370
387
type Searcher: Searcher<H>;
371
388
fn into_searcher(self, haystack: H) -> Self::Searcher;
@@ -380,6 +397,13 @@ pub trait Searcher<H: Haystack> {
380
397
fn next_reject(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
381
398
}
382
399
400
+ pub trait ReverseSearcher<H: Haystack>: Searcher<H> {
401
+ fn next_match_back(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
402
+ fn next_reject_back(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
403
+ }
404
+
405
+ pub trait DoubleEndedSearcher<H: Haystack>: ReverseSearcher<H> {}
406
+
383
407
// equivalent to SearchPtrs in "Pattern API 1.5"
384
408
// and PatternHaystack in "Pattern API 2.0"
385
409
pub trait Haystack: Sized {
@@ -403,7 +427,7 @@ pub trait Haystack: Sized {
403
427
}
404
428
```
405
429
406
- For ` &OsStr ` , we define both ` StartCursor ` and ` EndCursor ` as ` *const u8 ` .
430
+ For the ` &OsStr ` haystack , we define both ` StartCursor ` and ` EndCursor ` as ` *const u8 ` .
407
431
408
432
The ` start_to_end_cursor ` function will return ` cur + 2 ` if we find that ` cur ` points to the middle
409
433
of a 4-byte sequence.
@@ -452,7 +476,7 @@ match self.matcher.next_match() {
452
476
```
453
477
454
478
As a workaround , we introduced `find_range ` and `match_ranges `. Note that this is already a
455
- problem to solve if we want to make `Regex ` a pattern .
479
+ problem to solve if we want to make `Regex ` a pattern of strings .
456
480
457
481
# Rationale and alternatives
458
482
[alternatives ]: #alternatives
0 commit comments