fix: use input precision for avg decimal return type by kadinrabo · Pull Request #1027 · substrait-io/substrait

kadinrabo · 2026-03-31T13:52:07Z

Summary

Change the return type of avg for DECIMAL<P,S> from DECIMAL<38,S> to DECIMAL<P,S> in functions_arithmetic_decimal.yaml.

The current spec widens the output precision of AVG(DECIMAL<P,S>) to the maximum (38), matching what SUM does. However, unlike SUM, AVG is mathematically bounded by the range of the input values — the result of an average can never exceed the min/max of the input set. Precision widening is unnecessary for AVG and causes type incompatibilities in downstream query optimizers.

Problem

Consumers that use Substrait as an interchange format between a SQL frontend and an optimizer (e.g., Apache Calcite) encounter type validation errors because the optimizer's internal type inference for AVG returns DECIMAL<P,S> (same as input, per the SQL standard), while the Substrait plan declares DECIMAL<38,S>.

Specifically, Calcite's AggregateReduceFunctionsRule rewrites AVG(x) -> SUM(x) / COUNT(x). During this rewrite, it re-derives the intermediate types from the input column type. When the input is DECIMAL<10,2> but the AVG's declared output is DECIMAL<38,2>, the rule detects a type mismatch and throws:

Error while applying rule AggregateReduceFunctionsRule

This only affects DECIMAL — other numeric types (INT, FLOAT) don't have parameterized precision, so no mismatch occurs.

Why AVG is different from SUM

SUM legitimately needs the widened precision. Summing many DECIMAL(10,2) values can overflow 10 digits of precision — widening to 38 prevents this. The current spec is correct for SUM:

# SUM — widening is necessary (overflow is possible)
return: "DECIMAL?<38,S>"

AVG does not have this problem. The average of any set of DECIMAL(P,S) values is bounded by [min(input), max(input)], which by definition fits in DECIMAL(P,S). The only precision concern is in the fractional part from the division, which is a scale issue — and the scale S is already preserved.

Note that the intermediate type correctly uses DECIMAL<38,S> for the SUM component of the decomposed AVG (SUM + COUNT), which is appropriate. Only the final return type should match the input.

What other systems do

System	AVG(DECIMAL(P,S)) returns
SQL:2016 standard	DECIMAL(P,S) — same as input
Apache Calcite	DECIMAL(P,S) — same as input
PostgreSQL	NUMERIC (unbounded, effectively same precision)
Apache Spark	DECIMAL(min(P+4, 38), min(S+4, 18)) — modest widening
Trino	DECIMAL(38, max(S, 6)) — widens precision
Substrait (current)	DECIMAL(38, S) — max widening
Substrait (proposed)	DECIMAL(P, S) — match input

The proposed change aligns Substrait with the SQL standard and the majority of query engines.

Change

  - name: "avg"
    description: Average a set of values.
    impls:
      - args:
          - name: x
            value: "DECIMAL<P,S>"
        options:
          overflow:
            values: [ SILENT, SATURATE, ERROR ]
        nullability: DECLARED_OUTPUT
        decomposable: MANY
        intermediate: "STRUCT<DECIMAL<38,S>,i64>"
-       return: "DECIMAL<38,S>"
+       return: "DECIMAL<P,S>"

Note: the intermediate type remains STRUCT<DECIMAL<38,S>,i64> — the SUM component of the decomposed AVG still needs widened precision to prevent overflow during accumulation.

Impact

substrait-go: The resolveVariant function in expr/functions.go will automatically resolve AVG(DECIMAL<P,S>) to DECIMAL<P,S> instead of DECIMAL<38,S> once the YAML is updated.
substrait-java: Similarly uses the YAML for type resolution; will pick up the change automatically.
Consumers that depended on the widened type: Would need to handle the narrower return type. In practice, this is unlikely to cause issues since the result values were already within DECIMAL<P,S> range.

This change is

fix: use input precision for avg decimal return type

a4203ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use input precision for avg decimal return type#1027

fix: use input precision for avg decimal return type#1027
kadinrabo wants to merge 1 commit intosubstrait-io:mainfrom
kadinrabo:fix/avg-decimal-return-type

kadinrabo commented Mar 31, 2026 •

edited by jacques-n

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kadinrabo commented Mar 31, 2026 • edited by jacques-n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Why AVG is different from SUM

What other systems do

Change

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kadinrabo commented Mar 31, 2026 •

edited by jacques-n

Loading