Skip to content

[Bug] Incorrect Iceberg partition type #5384

@siadat

Description

@siadat

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

a4effef

Compute Engine

Verified it working with Flink 1.18

Minimal reproduce step

Create a table with Iceberg-compatibility enabled, and check the type of partition in an Avro file.

You will get an error by querying the table using Amazon Redshift Spectrum.

What doesn't meet your expectations?

$ java -jar avro-tools-1.12.0.jar getschema $avrofile | jq '.fields[] | select(.name == "data_file") | .type.fields[] | select(.name == "partition")'
{
  "name": "partition",
  "type": [
    "null",
    {
      "type": "record",
      "name": "r102",
      "fields": [
        {
          "name": "__event_date",
          "type": [
            "null",
            "string"
          ],
          "default": null
        }
      ]
    }
  ],
  "default": null
}

We expect it to be

{
  "name": "partition",
  "type": {
    "type": "record",
    "name": "r102",
    "fields": [
      {
        "name": "__event_date",
        "type": [
          "null",
          "string"
        ],
        "default": null
      }
    ]
  }
}

Anything else?

There's a slight difference in how the avro schema for the manifest files are implemented between native Iceberg and Paimon's Iceberg tables. Native Iceberg tables (e.g. created by FlinkSQL) correctly implement the Iceberg manifest file specification that the partition field should be a required struct. On the other hand, Paimon writes the partition field in the Iceberg manifest as a nullable struct.

This leads to Redshift Spectrum queries failing with ERROR: Wrong type in Avro file. ... context: Field: partition.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions