Skip to content

Refactor ordered-set aggregate Dataframe APIs to align with SQL #18279

@Jefffrey

Description

@Jefffrey

Is your feature request related to a problem or challenge?

See following functions:

/// Computes the exact percentile continuous of a set of numbers
pub fn percentile_cont(order_by: Sort, percentile: Expr) -> Expr {
let expr = order_by.expr.clone();

/// Computes the approximate percentile continuous of a set of numbers
pub fn approx_percentile_cont(
order_by: Sort,
percentile: Expr,
centroids: Option<Expr>,
) -> Expr {
let expr = order_by.expr.clone();

/// Computes the approximate percentile continuous with weight of a set of numbers
pub fn approx_percentile_cont_with_weight(
order_by: Sort,
weight: Expr,
percentile: Expr,
centroids: Option<Expr>,
) -> Expr {
let expr = order_by.expr.clone();

Issues are they accept a Sort which extracts the expression to calculate for, however this allows specifying null_order which is meaningless as they will always ignore nulls in their calculations.

Describe the solution you'd like

Consider refactoring to something like:

pub fn percentile_cont(expr: Expr, percentile: Expr, asc: bool) -> Expr

Describe alternatives you've considered

Don't do this

Additional context

See: #17805 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions