Skip to content

Commit db88f05

Browse files
committed
Merge branch 'main' into deploy-gh-pages
2 parents d9d7635 + da45036 commit db88f05

File tree

2 files changed

+196
-1
lines changed

2 files changed

+196
-1
lines changed

content/en/posts/post-13/index.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
date: 2024-09-10
3+
title: Using Custom Components in Swift's Regex
4+
slug: using-custom-components-in-swifts-regex
5+
description: Plug your own custom logic into any Swift Regex!
6+
tags: Regex, Swift
7+
---
8+
9+
In our [last article]({{< ref "post-12" >}}) we learned about Swift's `Regex` type and the various different ways to create them. Today we're going to dive a little deeper into one of those methods. We'll be building a custom `RegexComponent` using the [CustomConsumingRegexComponent](https://developer.apple.com/documentation/swift/customconsumingregexcomponent) protocol.
10+
11+
For a quick refresher, remember that we can create a custom parser using the `RegexBuilder` DSL like this:
12+
13+
```swift
14+
import RegexBuilder
15+
16+
Regex {
17+
Capture {
18+
Repeat(count: 3) {
19+
One(.digit)
20+
}
21+
}
22+
"-"
23+
Capture {
24+
Repeat(count: 3) {
25+
One(.digit)
26+
}
27+
}
28+
"-"
29+
Capture {
30+
Repeat(count: 4) {
31+
One(.digit)
32+
}
33+
}
34+
}
35+
```
36+
37+
Don't forget that we can use many built-in parsers provided by `Foundation` like this:
38+
39+
```swift
40+
let usdRegex = Regex {
41+
Capture(.currency(code: "USD").sign(strategy: .accounting))
42+
}
43+
44+
let dateRegex = Regex {
45+
Capture(
46+
.date(
47+
.numeric,
48+
locale: .autoupdatingCurrent,
49+
timeZone: .autoupdatingCurrent,
50+
calendar: .autoupdatingCurrent
51+
)
52+
)
53+
}
54+
55+
let intRegex = Regex {
56+
Capture(.localizedInteger(locale: .autoupdatingCurrent))
57+
}
58+
```
59+
60+
## Using `NSDataDetector` Inside a Swift `Regex`
61+
Not only can you use `Foundation`'s parsers, you can also create your own custom parsers, through the [CustomConsumingRegexComponent](https://developer.apple.com/documentation/swift/customconsumingregexcomponent) protocol. Let's create a new custom parser that uses Apple's `NSDataDetector` class.
62+
63+
First, let's get a working example of our `NSDataDetector` to detect phone numbers:
64+
65+
```swift
66+
import Foundation
67+
import RegexBuilder
68+
69+
let types: NSTextCheckingResult.CheckingType = [.phoneNumber]
70+
let detector = try NSDataDetector(types: types.rawValue)
71+
let input = "(808) 232-4825"
72+
let swiftRange = input.startIndex..<input.endIndex
73+
let nsRange = NSRange(swiftRange, in: input) // Fatal error: String index is out of bounds
74+
var result: String?
75+
detector.enumerateMatches(
76+
in: input,
77+
options: [],
78+
range: nsRange,
79+
using: { (match, flags, _) in
80+
guard let phoneNumber = match?.phoneNumber,
81+
let nsRange = match?.range,
82+
let swiftRange = Range.init(nsRange, in: input) else {
83+
print("no phone number found")
84+
result = nil
85+
return
86+
}
87+
print("found phone number: \(phoneNumber)")
88+
result = phoneNumber
89+
}
90+
)
91+
```
92+
93+
Conforming to `CustomConsumingRegexComponent` is fairly straightforward. We simply implement [consuming(_:startingAt:in:)](https://developer.apple.com/documentation/swift/customconsumingregexcomponent/consuming(_:startingat:in:)):
94+
95+
```swift
96+
public struct PhoneNumberDataDetector: CustomConsumingRegexComponent {
97+
public typealias RegexOutput = String
98+
public func consuming(
99+
_ input: String,
100+
startingAt index: String.Index,
101+
in bounds: Range<String.Index>
102+
) throws -> (upperBound: String.Index, output: String)? {
103+
// implementation goes here...
104+
}
105+
}
106+
```
107+
108+
## CustomConsumingRegexComponent
109+
So now let's plug in our earlier implementation:
110+
111+
```swift
112+
public struct PhoneNumberDataDetector: CustomConsumingRegexComponent {
113+
public typealias RegexOutput = String
114+
public func consuming(
115+
_ input: String,
116+
startingAt index: String.Index,
117+
in bounds: Range<String.Index>
118+
) throws -> (upperBound: String.Index, output: String)? {
119+
var result: (upperBound: String.Index, output: String)?
120+
121+
let types: NSTextCheckingResult.CheckingType = [.phoneNumber]
122+
let detector = try NSDataDetector(types: types.rawValue)
123+
let swiftRange = index..<input.endIndex
124+
let nsRange = NSRange(swiftRange, in: input) // Fatal error: String index is out of bounds
125+
detector.enumerateMatches(
126+
in: input,
127+
options: [],
128+
range: nsRange,
129+
using: { (match, flags, _) in
130+
guard let phoneNumber = match?.phoneNumber,
131+
let nsRange = match?.range,
132+
let swiftRange = Range(nsRange, in: input) else {
133+
// no phone number found
134+
result = nil; return
135+
}
136+
137+
result = (upperBound: swiftRange.upperBound, output: phoneNumber)
138+
}
139+
)
140+
141+
return result
142+
}
143+
}
144+
```
145+
146+
As you can see the `NSDataDetector` API is a bit cumbersome to use. Notice how we need to convert back and forth between `Range` and `NSRange`. As [Mattt from NSHipster said](https://nshipster.com/nsdatadetector/): "*NSDataDetector has an interface that only a mother could love.*" But now we have a new Swift interface that is far easier to use. Now that it is a `RegexComponent`, we can plug it into any `Regex` like this:
147+
148+
```swift
149+
let phoneNumberDataDetector: some RegexComponent = Regex {
150+
ChoiceOf {
151+
Anchor.startOfLine
152+
Anchor.startOfSubject
153+
One(.whitespace)
154+
}
155+
PhoneNumberDataDetector()
156+
ChoiceOf {
157+
Anchor.endOfLine
158+
Anchor.endOfSubject
159+
Anchor.endOfSubjectBeforeNewline
160+
One(.whitespace)
161+
}
162+
}
163+
```
164+
165+
_Note, however, that in Swift 6 language mode, we have Data Race Safety turned on. Unfortunately, `Regex` is not `Sendable` so we will need to isolate it somehow. One of the easiest ways to do this is to use a global actor._
166+
167+
```swift
168+
@MainActor
169+
let phoneNumberDataDetector: some RegexComponent = Regex {
170+
// ...
171+
}
172+
```
173+
174+
## Room For Improvement
175+
Writing parsers is complicated. Even the most sophisticated parser can be overcome by an obscure corner case. But by converting our `NSDataDetector` into a `RegexComponent` we now get to take advantage of decades of Apple's development. Even better, our new detector is composable and can be added as a component to any other `RegexComponent`.
176+
177+
Still, as great as this solution is, there is still room for improvement. In my testing, so far, I found many correctly parsed strings. "555-1234", "(808) 555-1234", "1 (808) 555-1234" were all correctly identified as phone numbers. However, I did have some false negatives (at least what would appear to be ). I would have expected "5555-1234" to NOT be identified as a phone number. Whatver solution you use, remember to test your code. Don't just test the happy path. Assert that your code does not generate **false-positives** or **false-negatives**.
178+
179+
Another issue is, there seems to be a mistake with the bounds calculation for my implementation of this parser. Because of this, it can work effectively with methods like `wholeMatch()` or `contains()`. However, it works incorrectly with `replace()`. Instead of just replacing the matched phone number, it replaces the entire string.
180+
181+
If you see any solutions, please don't hesitate to reach out to me on [Mastodon](https://iosdev.space/@dandylyons), or [X](https://x.com/dan_dee_lyons).
182+
183+
## Think of the Possibilities
184+
There are so many more powerful parsing libraries that could benefit from Swift's `Regex`. For example [Pointfree](https://www.pointfree.co/) has a powerful parsing library called [swift-parsing](https://swiftpackageindex.com/pointfreeco/swift-parsing#user-content-documentation). It has an API that looks a lot like RegexBuilders, and yet it's far more flexible. By creating a `CustomConsumingRegexComponent` we could allow any Regex to take advantage of the swift-parsing library.
185+
186+
Do you have any code that you think could be super-powered as a RegexComponent?
187+
188+
## Conclusion
189+
When writing parsers, we have to fully appreciate the full domain of the problem that we are trying to solve. There are dozens of phone numbers. Perhaps hundreds of phone companies. Many of these are standardized, but there are certainly exceptions to all of these standards and there is no one universally accepted standard. The problem domain is far too large and ever-changing for one team to tackle. Instead we should look to battle-tested parsers established by the community to tackle these problems.
190+
191+
That's why I created [NativeRegexExamples](https://swiftpackageindex.com/DandyLyons/NativeRegexExamples). It's a library where we can crowd-source our learning, and collectively discover best practices for various parsers. Please contribute, so that the entire community can benefit!

content/en/posts/post-ideas.md renamed to content/en/unlisted/post-ideas.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,15 @@ draft: true
33
date: 2024-01-01
44
---
55
USE https://blogrecorder.com/
6-
- [ ] DocC: hosting DocC on GitHub Pages
6+
- [ ] DocC:
7+
- [ ] DocC deep dive
8+
- [ ] hosting DocC on GitHub Pages
79
- [ ] Reducing macro build times with binaries, Tuist,
810
- [x] Typing keyboard shortcuts.
911
- [ ] Swift/C++ interoperability
1012
- [x] Helpful features of Swift Package Index
1113
- [ ] How to test in TCA even without Equatable, using xctassertnodifference
1214
- [ ] AI Architecture
1315
- [No Priors: Tengyu Ma | June 6th](https://share.snipd.com/episode/8db8d89c-b912-4aa0-8f7f-d2d712947c8f)
16+
- [ ] Dependabot for Swift packages: https://github.blog/changelog/2023-08-01-swift-support-for-dependabot-updates/
17+
- [ ] How to find unused Swift code with Periphery

0 commit comments

Comments
 (0)