Skip to content

[bug] Schema validation should reject field names that are invalid Avro identifiers #2123

@nvartolomei

Description

@nvartolomei

Apache Iceberg version

None

Please describe the bug 🐞

Example schema:

schema = Schema(
    NestedField(id=1, name="😎", field_type=StringType(), required=False),
)

partition_spec = PartitionSpec(
    PartitionField(
        source_id=1,
        field_id=1001,
        transform=IdentityTransform(),
        name="😎",)
)

Write some data then try to read it with DuckDB or simply:

avrocat /home/nv/src/pyiceberg-example/warehouse/default.db/nested_table/metadata/afc5e55c-6dd2-4875-841c-410108fccf8e-m0.avro | jq .
Error opening /home/nv/src/pyiceberg-example/warehouse/default.db/nested_table/metadata/afc5e55c-6dd2-4875-841c-410108fccf8e-m0.avro:
  Cannot parse file header: Invalid Avro identifier

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions