-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Support Avg distinct #15356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Avg distinct #15356
Conversation
dbedb8e
to
2c65ea0
Compare
query RT | ||
select avg(distinct c), arrow_typeof(avg(distinct c)) from t_decimal; | ||
---- | ||
180 Decimal128(14, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the type Decimal128(14, 8)
and Decimal256(54, 6)
below is correct
Update: this is correct but the way we get the result is incorrect
2c65ea0
to
709a29e
Compare
1ee8c23
to
3547eae
Compare
datafusion/functions-aggregate-common/src/aggregate/avg_distinct/decimal.rs
Show resolved
Hide resolved
3547eae
to
7960403
Compare
@@ -60,6 +64,17 @@ make_udaf_expr_and_func!( | |||
avg_udaf | |||
); | |||
|
|||
pub fn avg_distinct(expr: Expr) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AggregateExprBuilder::new().distinct()
is enough, we don't need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I added this becuase there's a count_distinct
pub fn count_distinct(expr: Expr) -> Expr { |
Should we deprecate this count_distinct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API exists for a long time ago so we keep it. We don't need to deprecate this
/// Specialized implementation of `AVG DISTINCT` for Float64 values, handling the | ||
/// special case for NaN values and floating-point equality. | ||
#[derive(Debug, Default)] | ||
pub struct Float64DistinctAvgAccumulator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you checkout DistinctSumAccumulator
, it supports f64, decimal together in single DistinctSumAccumulator
, I guess we can unify these types for average too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it totally works. I should have seen this earlier.
pub struct DecimalDistinctAvgAccumulator<T: DecimalType + Debug> { | ||
values: HashSet<Hashable<T::Native>, RandomState>, | ||
sum_scale: i8, | ||
target_precision: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Sum, we don't specify precision and scale, so we might not need it for avg
Closing this as I splitted the PR into 2 |
Which issue does this PR close?
avg(distinct)
support #2408Rationale for this change
Related: #15099
The pattern is mostly copied from native.rs
What changes are included in this PR?
Float64
andDecimal128
Decimal256
, and thus implemented avg distinctAre these changes tested?
Current test (used to be a failed query) passed and new ones are added
Are there any user-facing changes?
No