Skip to content

fix: use input precision for avg decimal return type#1027

Draft
kadinrabo wants to merge 1 commit intosubstrait-io:mainfrom
kadinrabo:fix/avg-decimal-return-type
Draft

fix: use input precision for avg decimal return type#1027
kadinrabo wants to merge 1 commit intosubstrait-io:mainfrom
kadinrabo:fix/avg-decimal-return-type

Conversation

@kadinrabo
Copy link
Copy Markdown
Contributor

@kadinrabo kadinrabo commented Mar 31, 2026

Summary

Change the return type of avg for DECIMAL<P,S> from DECIMAL<38,S> to DECIMAL<P,S> in functions_arithmetic_decimal.yaml.

The current spec widens the output precision of AVG(DECIMAL<P,S>) to the maximum (38), matching what SUM does. However, unlike SUM, AVG is mathematically bounded by the range of the input values — the result of an average can never exceed the min/max of the input set. Precision widening is unnecessary for AVG and causes type incompatibilities in downstream query optimizers.

Problem

Consumers that use Substrait as an interchange format between a SQL frontend and an optimizer (e.g., Apache Calcite) encounter type validation errors because the optimizer's internal type inference for AVG returns DECIMAL<P,S> (same as input, per the SQL standard), while the Substrait plan declares DECIMAL<38,S>.

Specifically, Calcite's AggregateReduceFunctionsRule rewrites AVG(x) -> SUM(x) / COUNT(x). During this rewrite, it re-derives the intermediate types from the input column type. When the input is DECIMAL<10,2> but the AVG's declared output is DECIMAL<38,2>, the rule detects a type mismatch and throws:

Error while applying rule AggregateReduceFunctionsRule

This only affects DECIMAL — other numeric types (INT, FLOAT) don't have parameterized precision, so no mismatch occurs.

Why AVG is different from SUM

SUM legitimately needs the widened precision. Summing many DECIMAL(10,2) values can overflow 10 digits of precision — widening to 38 prevents this. The current spec is correct for SUM:

# SUM — widening is necessary (overflow is possible)
return: "DECIMAL?<38,S>"

AVG does not have this problem. The average of any set of DECIMAL(P,S) values is bounded by [min(input), max(input)], which by definition fits in DECIMAL(P,S). The only precision concern is in the fractional part from the division, which is a scale issue — and the scale S is already preserved.

Note that the intermediate type correctly uses DECIMAL<38,S> for the SUM component of the decomposed AVG (SUM + COUNT), which is appropriate. Only the final return type should match the input.

What other systems do

System AVG(DECIMAL(P,S)) returns
SQL:2016 standard DECIMAL(P,S) — same as input
Apache Calcite DECIMAL(P,S) — same as input
PostgreSQL NUMERIC (unbounded, effectively same precision)
Apache Spark DECIMAL(min(P+4, 38), min(S+4, 18)) — modest widening
Trino DECIMAL(38, max(S, 6)) — widens precision
Substrait (current) DECIMAL(38, S) — max widening
Substrait (proposed) DECIMAL(P, S) — match input

The proposed change aligns Substrait with the SQL standard and the majority of query engines.

Change

  - name: "avg"
    description: Average a set of values.
    impls:
      - args:
          - name: x
            value: "DECIMAL<P,S>"
        options:
          overflow:
            values: [ SILENT, SATURATE, ERROR ]
        nullability: DECLARED_OUTPUT
        decomposable: MANY
        intermediate: "STRUCT<DECIMAL<38,S>,i64>"
-       return: "DECIMAL<38,S>"
+       return: "DECIMAL<P,S>"

Note: the intermediate type remains STRUCT<DECIMAL<38,S>,i64> — the SUM component of the decomposed AVG still needs widened precision to prevent overflow during accumulation.

Impact

  • substrait-go: The resolveVariant function in expr/functions.go will automatically resolve AVG(DECIMAL<P,S>) to DECIMAL<P,S> instead of DECIMAL<38,S> once the YAML is updated.
  • substrait-java: Similarly uses the YAML for type resolution; will pick up the change automatically.
  • Consumers that depended on the widened type: Would need to handle the narrower return type. In practice, this is unlikely to cause issues since the result values were already within DECIMAL<P,S> range.

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant