Unify implementations of `regexp_like` and `*` / `*~` operators

@pepijnve found that `*` goes faster than `regexp_like` for some cases, see:
- https://github.com/apache/datafusion/pull/17839

> If `~` is faster than `regexp_like` can we simply change the implementation to use the same underlying implementation of `~` (why only rewrite in some cases?)

And the answer in https://github.com/apache/datafusion/pull/17839#discussion_r2402499183 is succinctly summarized as

> That's probably the way to go to long term to deduplicate the code entirely, but that would be a much bigger change.

The full answer:
> See https://github.com/apache/datafusion/issues/17838#issuecomment-3355083929
> 
> The operator logic is in `physical_expr`, while `regexp_like` lives in `functions`. We would probably have to move the common logic to a separate crate. This PR was intended as a stopgap solution for common cases.
> 
> We can only rewrite in some cases because of the optional `flags` argument. With the operators all you have is the case sensitivity (i.e. the `i`flag).
> 
> The reason for the operator being more efficient is that it will make use of the `regexp_is_match_scalar` kernel if it can, while `regexp_like` always uses `regexp_is_match`. `regexp_is_match` does maintain a cache of compiled regexes so at least the pattern isn't compiled over and over again, but it's still quite a bit more code compared to `regexp_is_match_scalar`.
> 
> Additionally there's a regular expression simplification rule that only operates on `BinaryExpr` with one of the regex matching operators. The transformation here enables that optimisation for `regexp_like` calls as well.

This ticket tracks creating a single implementation

I'd like to see the different calls use the same implementation, having 2 implementations for this seems problematic. I'll file a followup issue is no one else does that references this ticket to create a common implementation.

_Originally posted by @Omega1 in https://github.com/apache/datafusion/issues/17839#issuecomment-3371761835_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify implementations of `regexp_like` and `` / `~` operators #17941

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unify implementations of regexp_like and * / *~ operators #17941

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unify implementations of `regexp_like` and `` / `~` operators #17941