-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
In #12269 @jayzhan211 made significant improvements to how group values are stored in multi-column aggregations.
Specifically for queries like
SELECT ... FROM ... GROUP BY col1, ... colN
The improvement relies on implementing specialized versions of GroupColumn
for the types of col1
, colN
We have implemented the primitive types and Strings/StringViews now, but we have not implemented all types
This means queries like
SELECT ... FROM ... GROUP BY int_cl, decimal_col
Will fall back to the slower (but general) GroupValuesRows
:
/// representation. | |
pub struct GroupValuesRows { |
Describe the solution you'd like
Implement GroupColumn
for Decimal128
types.
You can see how to do this here:
datafusion/datafusion/physical-plan/src/aggregates/group_values/mod.rs
Lines 117 to 121 in e4bd579
macro_rules! downcast_helper { | |
($t:ty, $d:ident) => { | |
return Ok(Box::new(GroupValuesPrimitive::<$t>::new($d.clone()))) | |
}; | |
} |
@jonathanc-n also made a really nice PR here
and the make sure there are tests for each of those types in queries that group on multiple columns
Describe alternatives you've considered
No response
Additional context
Here is an example for how this was done for Strings: #12809