[Bug] Null Character Suffix in Iceberg Manifest Files Due to toByteBuffer Invocation #5007
Closed
2 tasks done
Labels
bug
Something isn't working
Search before asking
Paimon version
1.1-SNAPSHOT
Compute Engine
Minimal reproduce step
Create a Paimon table with Iceberg compatibility enabled and partitioned by a
string
field. Try to query the Iceberg table with a predicate based on the partition field will not match any data. The Paimon table itself can be queried by the partition field but not the Iceberg table.event_date
.2024-12-30
are stored in the partition.SELECT * FROM iceberg_table WHERE event_date = '2024-12-30';
What doesn't meet your expectations?
The expectation is that the Iceberg table should accurately reflect the partitions defined in the underlying Paimon tables without any changes or alterations to the values during the serialization process. The presence of a null character suffix in the manifest files prevents successful querying by various client applications (Spark, Flink, Athena).
Anything else?
Concretely, when we inspect the avro manifest files we see that the column stats and partitions summary values have
\u0000
suffix e.g.Snapshot metadata file:
Manifest file column stats:
As a result, when a client (e.g. Spark/Athena) performs a scan of the Iceberg table, it'll skip all the data files after failing to find any manifests that match the predicate given in the query.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: