Skip to content

Conversation

sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Oct 13, 2025

Which issue does this PR close?

Rationale for this change

Add support for Variant::Utf-8, LargeUtf8, Utf8View. This needs to add a new builder VariantToStringArrowRowBuilder, because LargeUtf8, Utf8View are not ArrowPritimitiveType's

What changes are included in this PR?

  • Added support for Variant::Utf-8, LargeUtf8, Utf8View by adding a new enum and builder for utf8 and largeUtf8 and added utf8view to primitive builder.
  • Added a new variable data_capacity to make_string_variant_to_arrow_row_builder to support string types.
  • Updated the make_string_variant_to_arrow_row_builder in variant_get to include the variable.

Are these changes tested?

Added a variant_get test for utf8 type and created two separate tests for largeUtf8 and Utf8view because these types can't be shredded.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 13, 2025
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Oct 14, 2025

@alamb @scovich Please review when you can, thank you!


perfectly_shredded_to_arrow_primitive_test!(
get_variant_perfectly_shredded_utf8_as_utf8,
DataType::Utf8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add tests for other types(LargeUtf8/Utf8View) here?

The test here wants to cover the variant_get logic, and the tests added in variant_to_arrow.rs were to cover the logic of the builder?

TimestampNano(VariantToTimestampArrowRowBuilder<'a, datatypes::TimestampNanosecondType>),
TimestampNanoNtz(VariantToTimestampNtzArrowRowBuilder<'a, datatypes::TimestampNanosecondType>),
Date(VariantToPrimitiveArrowRowBuilder<'a, datatypes::Date32Type>),
StringView(VariantToUtf8ViewArrowBuilder<'a>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added StringView to the PrimitiveVariantToArrowRowBuilder and the other two to StringVariantToArrowRowBuilder, is there a particular reason for this?


define_variant_to_primitive_builder!(
struct VariantToUtf8ArrowRowBuilder<'a>
|item_capacity, data_capacity: usize| -> StringBuilder {StringBuilder::with_capacity(item_capacity, data_capacity)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|item_capacity, data_capacity: usize| -> StringBuilder {StringBuilder::with_capacity(item_capacity, data_capacity)},
|item_capacity, data_capacity: usize| -> StringBuilder { StringBuilder::with_capacity(item_capacity, data_capacity) },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add variant_to_arrow Utf-8, LargeUtf8, Utf8View types support

2 participants