-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
@pepijnve found that *
goes faster than regexp_like
for some cases, see:
If
~
is faster thanregexp_like
can we simply change the implementation to use the same underlying implementation of~
(why only rewrite in some cases?)
And the answer in #17839 (comment) is succinctly summarized as
That's probably the way to go to long term to deduplicate the code entirely, but that would be a much bigger change.
The full answer:
See #17838 (comment)
The operator logic is in
physical_expr
, whileregexp_like
lives infunctions
. We would probably have to move the common logic to a separate crate. This PR was intended as a stopgap solution for common cases.We can only rewrite in some cases because of the optional
flags
argument. With the operators all you have is the case sensitivity (i.e. thei
flag).The reason for the operator being more efficient is that it will make use of the
regexp_is_match_scalar
kernel if it can, whileregexp_like
always usesregexp_is_match
.regexp_is_match
does maintain a cache of compiled regexes so at least the pattern isn't compiled over and over again, but it's still quite a bit more code compared toregexp_is_match_scalar
.Additionally there's a regular expression simplification rule that only operates on
BinaryExpr
with one of the regex matching operators. The transformation here enables that optimisation forregexp_like
calls as well.
This ticket tracks creating a single implementation
I'd like to see the different calls use the same implementation, having 2 implementations for this seems problematic. I'll file a followup issue is no one else does that references this ticket to create a common implementation.
Originally posted by @Omega1 in #17839 (comment)