Description
Is your feature request related to a problem or challenge?
Part of #10918, [StringViewArray
](https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html) support in DataFusion
There are several queries in the clickbench suite like follows:
SELECT "MobilePhone", "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhone", "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;
SELECT "SearchPhrase", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY u DESC LIMIT 10;
SELECT "SearchEngineID", "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "SearchPhrase" ORDER BY c DESC LIMIT 10;
where "MobilePhoneModel"
and "SearchPhrase"
are string columns with predicates (in this case checking for empty string)
Describe the solution you'd like
In order to improve performance of these queries we will need the ability to actually compare StringViewArrays
to constant strings (and likely to each other)
Thus I would like to be able to run
StringViewColumn = scalar
StringViewColumn = StringViewColumn
(and likewise for BinaryView)
I basically want to to run the following queries (where table foo
has StringView
columns)
> create table foo as values ('Andrew', 'X'), ('Xiangpeng', 'Xiangpeng'), ('Raphael', 'R');
0 row(s) fetched.
Elapsed 0.002 seconds.
> select * from foo where column1 = 'Andrew';
+---------+---------+
| column1 | column2 |
+---------+---------+
| Andrew | X |
+---------+---------+
1 row(s) fetched.
Elapsed 0.003 seconds.
> select * from foo where column1 <> 'Andrew';
+-----------+-----------+
| column1 | column2 |
+-----------+-----------+
| Xiangpeng | Xiangpeng |
| Raphael | R |
+-----------+-----------+
2 row(s) fetched.
Elapsed 0.001 seconds.
> select * from foo where column1 = column2;
+-----------+-----------+
| column1 | column2 |
+-----------+-----------+
| Xiangpeng | Xiangpeng |
+-----------+-----------+
1 row(s) fetched.
Elapsed 0.002 seconds.
> select * from foo where column1 <> column2;
+---------+---------+
| column1 | column2 |
+---------+---------+
| Andrew | X |
| Raphael | R |
+---------+---------+
2 row(s) fetched.
Elapsed 0.001 seconds.
Describe alternatives you've considered
I suspect we will need to update the coercion logic and maybe also the arrow equality kernels like https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html
Additional context
No response