Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions datafusion/expr/src/udaf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -742,18 +742,23 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash + Send + Sync {
/// If this function is ordered-set aggregate function, return true
/// otherwise, return false
///
/// Ordered-set aggregate functions require an explicit `ORDER BY` clause
/// because the calculation performed by these functions is dependent on the
/// specific sequence of the input rows, unlike other aggregate functions
/// like `SUM`, `AVG`, or `COUNT`.
/// Ordered-set aggregate functions allow an `ORDER BY` clause because the
/// calculation performed by these functions is dependent on the specific
/// sequence of the input rows, unlike other aggregate functions like `SUM`
/// `AVG`, or `COUNT`. If explicit order is specified then a default order
/// of ascending is assumed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't enforce the default order; our only ordered set aggregate functions internally use ascending as the default, so I'm not sure if we should instead say its implementation dependent or try to enforce it somehow?

///
/// An example of an ordered-set aggregate function is `percentile_cont`
/// which computes a specific percentile value from a sorted list of values, and
/// is only meaningful when the input data is ordered.
/// Note that setting this to `true` does not guarantee input sort order to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good clarification

If DataFusion ever supports more ordered set aggregation functions, we may want to revisit this

In addition to saying what this setting doesn't do, maybe we could also say what setting it to true does do? Specifically, it seems like it only affects the output display somehow 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to saying what this setting doesn't do, maybe we could also say what setting it to true does do?

This is a good point 🤔

Let me look into the code a bit more to clarify my understanding and I'll update the doc accordingly.

/// the aggregate function; it instead gives the function full control over
/// the sorting process and they are expected to handle order of input values
/// themselves.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I'm correct in this; some reading I used for reference: https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-within-group/

///
/// An example of an ordered-set aggregate function is `approx_percentile_cont`
/// which computes an approximate percentile value from a sorted list of values.
///
/// In SQL syntax, ordered-set aggregate functions are used with the
/// `WITHIN GROUP (ORDER BY ...)` clause to specify the ordering of the input
/// data.
/// data, or can be simply omitted to assume ascending.
fn is_ordered_set_aggregate(&self) -> bool {
false
}
Expand Down