Skip to content

Unify implementations of regexp_like and * / *~ operators #17941

@alamb

Description

@alamb

@pepijnve found that * goes faster than regexp_like for some cases, see:

If ~ is faster than regexp_like can we simply change the implementation to use the same underlying implementation of ~ (why only rewrite in some cases?)

And the answer in #17839 (comment) is succinctly summarized as

That's probably the way to go to long term to deduplicate the code entirely, but that would be a much bigger change.

The full answer:

See #17838 (comment)

The operator logic is in physical_expr, while regexp_like lives in functions. We would probably have to move the common logic to a separate crate. This PR was intended as a stopgap solution for common cases.

We can only rewrite in some cases because of the optional flags argument. With the operators all you have is the case sensitivity (i.e. the iflag).

The reason for the operator being more efficient is that it will make use of the regexp_is_match_scalar kernel if it can, while regexp_like always uses regexp_is_match. regexp_is_match does maintain a cache of compiled regexes so at least the pattern isn't compiled over and over again, but it's still quite a bit more code compared to regexp_is_match_scalar.

Additionally there's a regular expression simplification rule that only operates on BinaryExpr with one of the regex matching operators. The transformation here enables that optimisation for regexp_like calls as well.

This ticket tracks creating a single implementation

I'd like to see the different calls use the same implementation, having 2 implementations for this seems problematic. I'll file a followup issue is no one else does that references this ticket to create a common implementation.

Originally posted by @Omega1 in #17839 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions