[Variant] Decimal unshredding support #8540

scovich · 2025-10-02T16:14:25Z

Which issue does this PR close?

Closes [Variant] [Shredding] Support typed_access for Decimal128 #8332

Rationale for this change

Missing feature

What changes are included in this PR?

Add decimal unshredding support, which should have been straightforward except:

The variant decimal types are not generic and do not implement any common trait that lets us generalize the logic easily. I added a custom trait in the unshredding module as a workaround, but we should probably look at something similar to arrow's DecimalType trait for VariantDecimalXX classes to implement.
The parquet reader seems to have a bug (feature?) that forces 32- and 64-bit decimal columns to Decimal128 unless the reader specifically requests a narrower type. Which causes the variant decimal integration tests to fail because they receive Variant::Decimal16 values when they expected Variant::Decimal4 or Variant::Decimal8 (the actual values are correct). Rather than directly tackle the bug in arrow-parquet itself (which has a large blast radius), I updated VariantArray constructor to cast such columns back to the correct type as needed.

Are these changes tested?

Yes. The variant decimal integration tests now pass where they used to fail.

Are there any user-facing changes?

No.

scovich · 2025-10-02T16:16:24Z

@alamb -- I don't know what to make of this parquet decimal widening thing? Do you know [somebody who knows] what might be going on here?

alamb

Thanks @scovich -- the unshredding part looks good to me and I left a comment about the parquet changes

alamb · 2025-10-02T18:25:59Z

parquet/src/arrow/schema/primitive.rs

-
-    Ok(DataType::Decimal256(precision, scale))
+    // Dispatch based on precision thresholds using DecimalType trait constants
+    if precision <= Decimal32Type::MAX_PRECISION {


Is this change needed for this PR?

Decimal32/Decimal64 were just added recently and it is not widely supported by all the kernels, or downstream systems (e.g. DataFusion)

I think this change means that some parquet files that currently are read as Decimal128 would come back as Decimal32/Decimal64 which is likely to be a pretty big change to some consumers.

I would suggest backing out this part of the change, if possble, and filing a separate ticket to discuss changing how decimal data is read from existing parquet files

If we want to proceed, we should probably need to add additional tests that show what happens when decimals are read from parquet.

Is this change needed for this PR?

Without it, almost all variant decimal integration tests fail because they expect Variant::Decimal4 or Variant::Decimal8 and they get Variant::Decimal16 instead. I don't know any good way to "fix" the test expectations and anyway the expectations are arguably correct and should not change.

That said...

I suspect the current version is too aggressive -- it uses precision as the ultimate lower bound regardless of the physical encoding; I'm working on a revised version that uses precision as an upper bound, with the physical encoding as lower bound.

Still TBD whether that reduces the test carnage at all, let alone whether it's actually the right approach.

I don't know any good way to "fix" the test expectations and anyway the expectations are arguably correct and should not change.

I recommend special casing the test harness to special case cast the Decimal types with a comment to a ticket
that tracks adding the support for real

I don't think we need to solve this Parquet --> Decimal32/64 support in this PR

What if we added the corrective cast to the VariantArray constructor, which already fixes up the value and metadata columns? The variant spec arguably requires using the narrowest possible decimal type for a given precision:

Decimal Precision Decimal value type Variant Physical Type

1 <= precision <= 9 int32 decimal4

10 <= precision <= 18 int64 decimal8

19 <= precision <= 38 int128 decimal16

> 38 Not supported

The casting fix worked! Even better, it's < 20 LoC and isolated completely to variant code.

Downside is... the cast.

So we should still try to figure out a way to make the parquet reader pull the correct type:

either globally (some variation of the fix I first attempted here)

or perhaps something more targeted in the parquet reader, which notices the variant type extension and massages the footer schema accordingly

Filed as #8549

This reverts commit 5b55f12.

…bit" This reverts commit 5a06fd5.

alamb

Love it!

alamb · 2025-10-02T19:41:04Z

parquet-variant-compute/src/variant_array.rs

-        Decimal64(p, s) if is_valid_variant_decimal(p, s, 18) => borrow!(),
-        Decimal128(p, s) if is_valid_variant_decimal(p, s, 38) => borrow!(),
+        //
+        // NOTE: arrow-parquet reads widens 32- and 64-bit decimals to 128-bit, but the variant spec


scovich · 2025-10-03T15:30:10Z

@alamb -- conflicts resolved, should be ready for merge now

alamb

Thank you @scovich

scovich added 2 commits October 2, 2025 09:07

[Variant] Decimal unshredding support

cd978f5

[Parquet] Fix bug that forced 32- and 64-bit decimals to 128-bit

5a06fd5

github-actions bot added parquet Changes to the parquet crate parquet-variant parquet-variant* crates labels Oct 2, 2025

alamb reviewed Oct 2, 2025

View reviewed changes

scovich added 4 commits October 2, 2025 12:03

more nuanced decimal type selection

5b55f12

Revert "more nuanced decimal type selection"

be18ab4

This reverts commit 5b55f12.

Revert "[Parquet] Fix bug that forced 32- and 64-bit decimals to 128-…

1a133a7

…bit" This reverts commit 5a06fd5.

VariantArray casts decimals to correct type based on precision

c0f009d

alamb approved these changes Oct 2, 2025

View reviewed changes

scovich added 2 commits October 2, 2025 12:43

un-break the fix

c699cc4

bypass unnecessary cast

57564e9

scovich marked this pull request as ready for review October 2, 2025 20:09

Merge remote-tracking branch 'oss/main' into unshred-decimal

173de12

scovich mentioned this pull request Oct 3, 2025

Parquet reader forces Decimal32/64 columns to Decimal128 #8549

Open

alamb approved these changes Oct 3, 2025

View reviewed changes

alamb merged commit 8e5d826 into apache:main Oct 3, 2025
19 checks passed

alamb mentioned this pull request Oct 17, 2025

[Variant] [Shredding] Support typed_access for Decimal128 #8332

Closed

Decimal Precision	Decimal value type	Variant Physical Type
1 <= precision <= 9	int32	decimal4
10 <= precision <= 18	int64	decimal8
19 <= precision <= 38	int128	decimal16
> 38	Not supported

[Variant] Decimal unshredding support #8540

[Variant] Decimal unshredding support #8540

Uh oh!

Conversation

scovich commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

scovich commented Oct 2, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich commented Oct 3, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scovich commented Oct 2, 2025 •

edited

Loading