|
| 1 | +--- |
| 2 | +date: 2024-09-10 |
| 3 | +title: Using Custom Components in Swift's Regex |
| 4 | +slug: using-custom-components-in-swifts-regex |
| 5 | +description: Plug your own custom logic into any Swift Regex! |
| 6 | +tags: Regex, Swift |
| 7 | +--- |
| 8 | + |
| 9 | +In our [last article]({{< ref "post-12" >}}) we learned about Swift's `Regex` type and the various different ways to create them. Today we're going to dive a little deeper into one of those methods. We'll be building a custom `RegexComponent` using the [CustomConsumingRegexComponent](https://developer.apple.com/documentation/swift/customconsumingregexcomponent) protocol. |
| 10 | + |
| 11 | +For a quick refresher, remember that we can create a custom parser using the `RegexBuilder` DSL like this: |
| 12 | + |
| 13 | +```swift |
| 14 | +import RegexBuilder |
| 15 | + |
| 16 | +Regex { |
| 17 | + Capture { |
| 18 | + Repeat(count: 3) { |
| 19 | + One(.digit) |
| 20 | + } |
| 21 | + } |
| 22 | + "-" |
| 23 | + Capture { |
| 24 | + Repeat(count: 3) { |
| 25 | + One(.digit) |
| 26 | + } |
| 27 | + } |
| 28 | + "-" |
| 29 | + Capture { |
| 30 | + Repeat(count: 4) { |
| 31 | + One(.digit) |
| 32 | + } |
| 33 | + } |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +Don't forget that we can use many built-in parsers provided by `Foundation` like this: |
| 38 | + |
| 39 | +```swift |
| 40 | +let usdRegex = Regex { |
| 41 | + Capture(.currency(code: "USD").sign(strategy: .accounting)) |
| 42 | +} |
| 43 | + |
| 44 | +let dateRegex = Regex { |
| 45 | + Capture( |
| 46 | + .date( |
| 47 | + .numeric, |
| 48 | + locale: .autoupdatingCurrent, |
| 49 | + timeZone: .autoupdatingCurrent, |
| 50 | + calendar: .autoupdatingCurrent |
| 51 | + ) |
| 52 | + ) |
| 53 | +} |
| 54 | + |
| 55 | +let intRegex = Regex { |
| 56 | + Capture(.localizedInteger(locale: .autoupdatingCurrent)) |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +## Using `NSDataDetector` Inside a Swift `Regex` |
| 61 | +Not only can you use `Foundation`'s parsers, you can also create your own custom parsers, through the [CustomConsumingRegexComponent](https://developer.apple.com/documentation/swift/customconsumingregexcomponent) protocol. Let's create a new custom parser that uses Apple's `NSDataDetector` class. |
| 62 | + |
| 63 | +First, let's get a working example of our `NSDataDetector` to detect phone numbers: |
| 64 | + |
| 65 | +```swift |
| 66 | +import Foundation |
| 67 | +import RegexBuilder |
| 68 | + |
| 69 | +let types: NSTextCheckingResult.CheckingType = [.phoneNumber] |
| 70 | +let detector = try NSDataDetector(types: types.rawValue) |
| 71 | +let input = "(808) 232-4825" |
| 72 | +let swiftRange = input.startIndex..<input.endIndex |
| 73 | +let nsRange = NSRange(swiftRange, in: input) // Fatal error: String index is out of bounds |
| 74 | +var result: String? |
| 75 | +detector.enumerateMatches( |
| 76 | + in: input, |
| 77 | + options: [], |
| 78 | + range: nsRange, |
| 79 | + using: { (match, flags, _) in |
| 80 | + guard let phoneNumber = match?.phoneNumber, |
| 81 | + let nsRange = match?.range, |
| 82 | + let swiftRange = Range.init(nsRange, in: input) else { |
| 83 | + print("no phone number found") |
| 84 | + result = nil |
| 85 | + return |
| 86 | + } |
| 87 | + print("found phone number: \(phoneNumber)") |
| 88 | + result = phoneNumber |
| 89 | + } |
| 90 | +) |
| 91 | +``` |
| 92 | + |
| 93 | +Conforming to `CustomConsumingRegexComponent` is fairly straightforward. We simply implement [consuming(_:startingAt:in:)](https://developer.apple.com/documentation/swift/customconsumingregexcomponent/consuming(_:startingat:in:)): |
| 94 | + |
| 95 | +```swift |
| 96 | +public struct PhoneNumberDataDetector: CustomConsumingRegexComponent { |
| 97 | + public typealias RegexOutput = String |
| 98 | + public func consuming( |
| 99 | + _ input: String, |
| 100 | + startingAt index: String.Index, |
| 101 | + in bounds: Range<String.Index> |
| 102 | + ) throws -> (upperBound: String.Index, output: String)? { |
| 103 | + // implementation goes here... |
| 104 | + } |
| 105 | +} |
| 106 | +``` |
| 107 | + |
| 108 | +## CustomConsumingRegexComponent |
| 109 | +So now let's plug in our earlier implementation: |
| 110 | + |
| 111 | +```swift |
| 112 | +public struct PhoneNumberDataDetector: CustomConsumingRegexComponent { |
| 113 | + public typealias RegexOutput = String |
| 114 | + public func consuming( |
| 115 | + _ input: String, |
| 116 | + startingAt index: String.Index, |
| 117 | + in bounds: Range<String.Index> |
| 118 | + ) throws -> (upperBound: String.Index, output: String)? { |
| 119 | + var result: (upperBound: String.Index, output: String)? |
| 120 | + |
| 121 | + let types: NSTextCheckingResult.CheckingType = [.phoneNumber] |
| 122 | + let detector = try NSDataDetector(types: types.rawValue) |
| 123 | + let swiftRange = index..<input.endIndex |
| 124 | + let nsRange = NSRange(swiftRange, in: input) // Fatal error: String index is out of bounds |
| 125 | + detector.enumerateMatches( |
| 126 | + in: input, |
| 127 | + options: [], |
| 128 | + range: nsRange, |
| 129 | + using: { (match, flags, _) in |
| 130 | + guard let phoneNumber = match?.phoneNumber, |
| 131 | + let nsRange = match?.range, |
| 132 | + let swiftRange = Range(nsRange, in: input) else { |
| 133 | + // no phone number found |
| 134 | + result = nil; return |
| 135 | + } |
| 136 | + |
| 137 | + result = (upperBound: swiftRange.upperBound, output: phoneNumber) |
| 138 | + } |
| 139 | + ) |
| 140 | + |
| 141 | + return result |
| 142 | + } |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +As you can see the `NSDataDetector` API is a bit cumbersome to use. Notice how we need to convert back and forth between `Range` and `NSRange`. As [Mattt from NSHipster said](https://nshipster.com/nsdatadetector/): "*NSDataDetector has an interface that only a mother could love.*" But now we have a new Swift interface that is far easier to use. Now that it is a `RegexComponent`, we can plug it into any `Regex` like this: |
| 147 | + |
| 148 | +```swift |
| 149 | +let phoneNumberDataDetector: some RegexComponent = Regex { |
| 150 | + ChoiceOf { |
| 151 | + Anchor.startOfLine |
| 152 | + Anchor.startOfSubject |
| 153 | + One(.whitespace) |
| 154 | + } |
| 155 | + PhoneNumberDataDetector() |
| 156 | + ChoiceOf { |
| 157 | + Anchor.endOfLine |
| 158 | + Anchor.endOfSubject |
| 159 | + Anchor.endOfSubjectBeforeNewline |
| 160 | + One(.whitespace) |
| 161 | + } |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +_Note, however, that in Swift 6 language mode, we have Data Race Safety turned on. Unfortunately, `Regex` is not `Sendable` so we will need to isolate it somehow. One of the easiest ways to do this is to use a global actor._ |
| 166 | + |
| 167 | +```swift |
| 168 | +@MainActor |
| 169 | +let phoneNumberDataDetector: some RegexComponent = Regex { |
| 170 | + // ... |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +## Room For Improvement |
| 175 | +Writing parsers is complicated. Even the most sophisticated parser can be overcome by an obscure corner case. But by converting our `NSDataDetector` into a `RegexComponent` we now get to take advantage of decades of Apple's development. Even better, our new detector is composable and can be added as a component to any other `RegexComponent`. |
| 176 | + |
| 177 | +Still, as great as this solution is, there is still room for improvement. In my testing, so far, I found many correctly parsed strings. "555-1234", "(808) 555-1234", "1 (808) 555-1234" were all correctly identified as phone numbers. However, I did have some false negatives (at least what would appear to be ). I would have expected "5555-1234" to NOT be identified as a phone number. Whatver solution you use, remember to test your code. Don't just test the happy path. Assert that your code does not generate **false-positives** or **false-negatives**. |
| 178 | + |
| 179 | +Another issue is, there seems to be a mistake with the bounds calculation for my implementation of this parser. Because of this, it can work effectively with methods like `wholeMatch()` or `contains()`. However, it works incorrectly with `replace()`. Instead of just replacing the matched phone number, it replaces the entire string. |
| 180 | + |
| 181 | +If you see any solutions, please don't hesitate to reach out to me on [Mastodon](https://iosdev.space/@dandylyons), or [X](https://x.com/dan_dee_lyons). |
| 182 | + |
| 183 | +## Think of the Possibilities |
| 184 | +There are so many more powerful parsing libraries that could benefit from Swift's `Regex`. For example [Pointfree](https://www.pointfree.co/) has a powerful parsing library called [swift-parsing](https://swiftpackageindex.com/pointfreeco/swift-parsing#user-content-documentation). It has an API that looks a lot like RegexBuilders, and yet it's far more flexible. By creating a `CustomConsumingRegexComponent` we could allow any Regex to take advantage of the swift-parsing library. |
| 185 | + |
| 186 | +Do you have any code that you think could be super-powered as a RegexComponent? |
| 187 | + |
| 188 | +## Conclusion |
| 189 | +When writing parsers, we have to fully appreciate the full domain of the problem that we are trying to solve. There are dozens of phone numbers. Perhaps hundreds of phone companies. Many of these are standardized, but there are certainly exceptions to all of these standards and there is no one universally accepted standard. The problem domain is far too large and ever-changing for one team to tackle. Instead we should look to battle-tested parsers established by the community to tackle these problems. |
| 190 | + |
| 191 | +That's why I created [NativeRegexExamples](https://swiftpackageindex.com/DandyLyons/NativeRegexExamples). It's a library where we can crowd-source our learning, and collectively discover best practices for various parsers. Please contribute, so that the entire community can benefit! |
0 commit comments