Skip to content

[Variant] easier way to construct a shredded schema #8922

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Filing this for a friend (@XiangpengHao !)

arrow-rs contains a kernel to "shred" a variant value: shred_variant

However, to call this method you need to create the entire shredded schema, which is complicated. For example to shred a time and host field you would need to provide something like:

{
  metadata: BINARY,
  value: BINARY,
  typed_value: {
    time: {
      value: BINARY,
      typed_value: Timestamp,
    },
    hostname: {
      value: BINARY,
      typed_value: String,
    },
  }
}

Describe the solution you'd like
I would like it to be easier to construct shredded variants for common cases

Describe alternatives you've considered

@XiangpengHao suggested to me in person that a similar API to variant_get would be natural. You would explicitly specify which path should be shredded as what type

For example, it would be nice to create the schema above by specifying only time and host. Something like

let schema = VariantSchemaBuilder::default()
  .with_path("time", &DataType::Timestamp(Nanoseconds, None));
  .with_path("hostname", &DataType::Utf8)
  .build();

// Pass in the shredded schema
let shredded_array = shred_variant(&input, &schema)?;

It turns out (naturally) that he implemented just such an API in liquid cache that I think is worth considering upstream:

https://github.com/XiangpengHao/liquid-cache/blob/33dbaaaec3de5207885e778e57d14df2fb69071f/src/storage/src/utils/variant_schema.rs#L14

Here is an example of it working: https://github.com/XiangpengHao/liquid-cache/blob/33dbaaaec3de5207885e778e57d14df2fb69071f/src/storage/src/utils/variant_schema.rs#L162-L178

Additional context

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions