Skip to content

Implement StringViewArray and BinaryViewArray reading/writing in parquet #5530

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This is part of the larger project to implement StringViewArray -- see #5374

In #5481 we added support for StringViewArray and ByteViewArray.

The parquet crate has a reader and writer for reading/writing parquet data to arrow:

Describe the solution you'd like
I would like to be able to read a StringViewArray and BinaryViewArray directly from the reader and writer with no data copies (so the raw byte values are not copied).

  1. Add functionality
  2. Add tests

Describe alternatives you've considered

For example, I think we need to add the support to the writer here

ArrowDataType::Dictionary(_, value_type) => match value_type.as_ref() {
ArrowDataType::Utf8 | ArrowDataType::LargeUtf8 | ArrowDataType::Binary | ArrowDataType::LargeBinary => {
out.push(bytes(leaves.next().unwrap()))
}
_ => {
out.push(col(leaves.next().unwrap()))
}
}
_ => return Err(ParquetError::NYI(
format!(
"Attempting to write an Arrow type {data_type:?} to parquet that is not yet implemented"
)
))
}

Additional context

The reader/writer already handles DictionaryArrays which I think could serve as a model for the view arrays.

@ariesdevil reports they are working on this feature #5374 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    parquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions