feat: add has_overlap function to functions_list.yaml#987
feat: add has_overlap function to functions_list.yaml#987benbellick wants to merge 5 commits intomainfrom
Conversation
extensions/functions_set.yaml
Outdated
| elements overlap. | ||
|
|
||
| - THREE_VALUED: Returns NULL when null elements are present in | ||
| both lists but no non-null overlap is found. |
There was a problem hiding this comment.
same comment as @vbarua had in the other PR... if option is not specified, I guess it is assumed to be engine dependent?
I keep forgetting the dialect... does the dialect allow specify supported options or enum arguments?
There was a problem hiding this comment.
I'm still not sure what the right thing to do here is 😟
| urn: extension:io.substrait:functions_set | ||
| scalar_functions: | ||
| - | ||
| name: "index_in" |
There was a problem hiding this comment.
it's odd that this function is actually in the set 😄
There was a problem hiding this comment.
Yes 🤦 if it returned a boolean I could see it. But returning the index? That isn't very set like 😅
There was a problem hiding this comment.
Do we also want to move this to list? Sorry, this comment was not published. It is technically a breaking change but wondering how many would have taken dependencies...
There was a problem hiding this comment.
we can now deprecate this and move it to list function! 😆 please consider it in different PR.
| urn: extension:io.substrait:functions_set | ||
| scalar_functions: | ||
| - | ||
| name: "index_in" |
There was a problem hiding this comment.
Do we also want to move this to list? Sorry, this comment was not published. It is technically a breaking change but wondering how many would have taken dependencies...
|
@yongchul I agree it may be a good idea to just get rid of |
I'm okay with keeping the file but it would be nice if we can do some prep work (i.e., duplicating the function and have description in the file for deprecation warning). It's fine to do that in a separate PR. |
| urn: extension:io.substrait:functions_set | ||
| scalar_functions: | ||
| - | ||
| name: "index_in" |
There was a problem hiding this comment.
we can now deprecate this and move it to list function! 😆 please consider it in different PR.
| - name: right | ||
| value: list<any1> | ||
| options: | ||
| null_handling: |
There was a problem hiding this comment.
my 2 cents are enum for clarity and interoperability. One might argue that this is an option that should follow system default but
a. the alternatives are rather well-understood
b. it may not be consistent (i.e., system may not implement null handling consistently)
c. it works better with the dialect I believe...
d. null handling is... kind of the essential to the semantic of the overlap (null is not an exception in real data unfortunately).
Closes #986
Adds a
has_overlapscalar function tofunctions_list.yamlthat determines whether two lists share any common elements.This was initially placed in
functions_set.yaml, but per discussion it makes more sense infunctions_list.yamlsince the function operates onlisttypes and bundling functions under afunction_<datatype>namespace is the more natural convention.Cross-Engine Semantic Comparison
The following table was generated by running equivalent queries across engines via Docker (see details below for the script used to generate it).
[1,2,3] && [3,4,5][1,2,3] && [4,5,6][1,2,3] && [][] && [][1,NULL,3] && [3,4][1,NULL,3] && [4,5][1,NULL] && [NULL,4]NULL && [1,2][1,1,2] && [1,3]The divergence is entirely about how NULL elements are handled when there is no definitive non-null overlap:
This is modeled via a
null_handlingoption with those three values.Note: test cases for null list arguments (e.g.
has_overlap(null::list<i32>, ...)) will be added once #968 lands.Cross-engine verification script
Note: These changes were made with Claude's assistance. All code has been reviewed by me.
This change is