-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Implement collect_list, array_agg equivalent of spark.
Note that in Spark, the array_agg
is an alias of collect_list
, link here.
Also note that, Datafusion also support array_agg
, however, there seems to a difference in behaviour and syntax with Spark.
For eg. Datafusion support ORDER BY
within array_agg
, link, and can provide deterministic ordering. Spark on the other hand, doesn't support ORDER BY
within array_agg
and does not ensure deterministic ordering. Spark doc explicitly mentions this for all 2 functions - The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
, link.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request