-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
Compute Engine
Verified it working with Flink 1.18
Minimal reproduce step
Create a table with Iceberg-compatibility enabled, and check the type of partition
in an Avro file.
You will get an error by querying the table using Amazon Redshift Spectrum.
What doesn't meet your expectations?
$ java -jar avro-tools-1.12.0.jar getschema $avrofile | jq '.fields[] | select(.name == "data_file") | .type.fields[] | select(.name == "partition")'
{
"name": "partition",
"type": [
"null",
{
"type": "record",
"name": "r102",
"fields": [
{
"name": "__event_date",
"type": [
"null",
"string"
],
"default": null
}
]
}
],
"default": null
}
We expect it to be
{
"name": "partition",
"type": {
"type": "record",
"name": "r102",
"fields": [
{
"name": "__event_date",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
Anything else?
There's a slight difference in how the avro schema for the manifest files are implemented between native Iceberg and Paimon's Iceberg tables. Native Iceberg tables (e.g. created by FlinkSQL) correctly implement the Iceberg manifest file specification that the partition field should be a required struct. On the other hand, Paimon writes the partition field in the Iceberg manifest as a nullable struct.
This leads to Redshift Spectrum queries failing with ERROR: Wrong type in Avro file.
... context: Field: partition.
Are you willing to submit a PR?
- I'm willing to submit a PR!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working