Skip to content

Commit 3d48148

Browse files
committed
Expand more about the Pattern API.
1 parent c15fd7d commit 3d48148

File tree

1 file changed

+32
-8
lines changed

1 file changed

+32
-8
lines changed

text/0000-os-str-pattern.md

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -350,22 +350,39 @@ should move backward by 2 bytes before continuing. This ensure searching for `\u
350350
351351
## Pattern API
352352
353-
This RFC assumes a generalized pattern API which supports more than strings. If the pattern API is
354-
not available, the new functions can take `&OsStr` instead of `impl Pattern<&OsStr>`, but this may
355-
hurt future compatibility due to inference breakage.
353+
As of Rust 1.25, we can search a `&str` using a character, a character set or another string,
354+
powered by [RFC #528](https://github.com/rust-lang/rfcs/pull/528) a.k.a. “Pattern API 1.0”.
356355
357-
Assuming we do want to generalize the Pattern API, the implementor should note the issue of
358-
splitting a surrogate pair:
356+
There are some drafts to generalize this so that we could retain mutability and search in more types
357+
such as `&[T]` and `&OsStr`, as described in various comments
358+
(“[v1.5](https://github.com/rust-lang/rust/issues/27721#issuecomment-185405392)” and
359+
“[v2.0](https://github.com/rust-lang/rfcs/pull/1309#issuecomment-214030263)”). A proper RFC has not
360+
been proposed so far.
361+
362+
This RFC assumes the target of generalizing the Pattern API beyond `&str` is accepted, enabling us
363+
to provide a uniform search API between different types of haystack and needles. However, this RFC
364+
does not rely on a generalized Pattern API. If this RFC is stabilized without a generalized Pattern
365+
API, the new methods described in the [Guide-level explanation][guide-level-explanation] section can
366+
take `&OsStr` instead of `impl Pattern<&OsStr>`, but this may hurt future compatibility due to
367+
inference breakage if generalized Pattern API is indeed implemented.
368+
369+
Assuming we do want to generalize Pattern API, the implementor should note the issue of splitting a
370+
surrogate pair:
359371
360372
1. A match which starts with a low surrogate will point to byte 1 of the 4-byte sequence
361373
2. An index always point to byte 2 of the 4-byte sequence
362374
3. A match which ends with a high surrogate will point to byte 3 of the 4-byte sequence
363375
364376
Implementation should note these different offsets when converting between different kinds of
365377
cursors. In the [`omgwtf8::pattern` module](https://docs.rs/omgwtf8/*/omgwtf8/pattern/index.html),
366-
this behavior is enforced by using distinct types for the start and end cursors.
378+
based on the “v1.5” draft, this behavior is enforced in the API design by using distinct types for
379+
the start and end cursors.
380+
381+
The following outlines the generalized Pattern API which could work for `&OsStr`:
367382
368383
```rust
384+
// in module `core::pattern`:
385+
369386
pub trait Pattern<H: Haystack>: Sized {
370387
type Searcher: Searcher<H>;
371388
fn into_searcher(self, haystack: H) -> Self::Searcher;
@@ -380,6 +397,13 @@ pub trait Searcher<H: Haystack> {
380397
fn next_reject(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
381398
}
382399
400+
pub trait ReverseSearcher<H: Haystack>: Searcher<H> {
401+
fn next_match_back(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
402+
fn next_reject_back(&mut self) -> Option<(H::StartCursor, H::EndCursor)>;
403+
}
404+
405+
pub trait DoubleEndedSearcher<H: Haystack>: ReverseSearcher<H> {}
406+
383407
// equivalent to SearchPtrs in "Pattern API 1.5"
384408
// and PatternHaystack in "Pattern API 2.0"
385409
pub trait Haystack: Sized {
@@ -403,7 +427,7 @@ pub trait Haystack: Sized {
403427
}
404428
```
405429

406-
For `&OsStr`, we define both `StartCursor` and `EndCursor` as `*const u8`.
430+
For the `&OsStr` haystack, we define both `StartCursor` and `EndCursor` as `*const u8`.
407431

408432
The `start_to_end_cursor` function will return `cur + 2` if we find that `cur` points to the middle
409433
of a 4-byte sequence.
@@ -452,7 +476,7 @@ match self.matcher.next_match() {
452476
```
453477

454478
As a workaround, we introduced `find_range` and `match_ranges`. Note that this is already a
455-
problem to solve if we want to make `Regex` a pattern.
479+
problem to solve if we want to make `Regex` a pattern of strings.
456480

457481
# Rationale and alternatives
458482
[alternatives]: #alternatives

0 commit comments

Comments
 (0)