-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Filing this for a friend (@XiangpengHao !)
arrow-rs contains a kernel to "shred" a variant value: shred_variant
However, to call this method you need to create the entire shredded schema, which is complicated. For example to shred a time and host field you would need to provide something like:
{
metadata: BINARY,
value: BINARY,
typed_value: {
time: {
value: BINARY,
typed_value: Timestamp,
},
hostname: {
value: BINARY,
typed_value: String,
},
}
}Describe the solution you'd like
I would like it to be easier to construct shredded variants for common cases
Describe alternatives you've considered
@XiangpengHao suggested to me in person that a similar API to variant_get would be natural. You would explicitly specify which path should be shredded as what type
For example, it would be nice to create the schema above by specifying only time and host. Something like
let schema = VariantSchemaBuilder::default()
.with_path("time", &DataType::Timestamp(Nanoseconds, None));
.with_path("hostname", &DataType::Utf8)
.build();
// Pass in the shredded schema
let shredded_array = shred_variant(&input, &schema)?;It turns out (naturally) that he implemented just such an API in liquid cache that I think is worth considering upstream:
Here is an example of it working: https://github.com/XiangpengHao/liquid-cache/blob/33dbaaaec3de5207885e778e57d14df2fb69071f/src/storage/src/utils/variant_schema.rs#L162-L178
Additional context