Skip to content

Conversation

7hong
Copy link
Contributor

@7hong 7hong commented Oct 21, 2025

Description

fixed #23563

Supports specifying the identifier_fields attribute when creating an iceberg table, and can also modify or remove the identifier_fields attribute in the Alter table statement.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( x) Release notes are required, with the following suggested text:

## Section
* Add support for the `identifier_fields` property in the iceberg table. ({issue}`issuenumber`)

Summary by Sourcery

Add support for the identifier_fields table property in Iceberg tables, allowing users to specify, modify, and remove identifier fields via CREATE TABLE WITH and ALTER TABLE, enforcing validation for field existence and non-nullability, and propagating identifier_fields through metadata operations, schema creation (including materialized views), and SHOW CREATE TABLE.

New Features:

  • Allow specifying the identifier_fields property in Iceberg CREATE TABLE and ALTER TABLE statements
  • Support clearing identifier_fields via an empty array in ALTER TABLE
  • Reflect identifier_fields in SHOW CREATE TABLE output

Enhancements:

  • Extend schemaFromMetadata, IcebergMetadata, and IcebergUtil to handle identifier_fields
  • Validate identifier_fields for existence and non-nullability before applying
  • Include identifier_fields support in materialized view creation path

Documentation:

  • Document the identifier_fields property in the Iceberg connector documentation

Tests:

  • Add unit tests for creating tables with single and multiple identifier_fields and updating them
  • Add error-case tests for invalid and optional identifier fields
  • Extend connector smoke tests to cover identifier_fields

@cla-bot cla-bot bot added the cla-signed label Oct 21, 2025
Copy link

sourcery-ai bot commented Oct 21, 2025

Reviewer's Guide

This PR introduces full support for the Iceberg table property identifier_fields by defining the property, wiring it through schema creation and commit paths, validating values, exposing it in SHOW CREATE TABLE, and adding end-to-end tests and documentation.

Entity relationship diagram for Iceberg table properties including identifier_fields

erDiagram
    ICEBERG_TABLE ||--o| COLUMN : has
    ICEBERG_TABLE ||--o| IDENTIFIER_FIELD : has
    ICEBERG_TABLE {
      string name
      list identifier_fields
    }
    IDENTIFIER_FIELD {
      string name
      int field_id
    }
    COLUMN {
      string name
      string type
      bool is_optional
      int field_id
    }
Loading

Class diagram for Iceberg table property handling with identifier_fields

classDiagram
    class IcebergTableProperties {
      +IDENTIFIER_FIELDS_PROPERTY : String
      +getIdentifierFields(Map<String, Object>) : List<String>
    }
    class IcebergMetadata {
      +setTableProperties(...)
      +updateIdentifierFields(Table, Transaction, List<String>)
    }
    class IcebergUtil {
      +schemaFromMetadata(List<ColumnMetadata>, List<String>) : Schema
    }
    IcebergTableProperties <.. IcebergMetadata : uses
    IcebergTableProperties <.. IcebergUtil : uses
    IcebergMetadata <.. IcebergUtil : uses
    IcebergMetadata o-- IcebergTableProperties : has IDENTIFIER_FIELDS_PROPERTY
    IcebergMetadata o-- IcebergMetadata : has updateIdentifierFields()
    IcebergUtil o-- IcebergUtil : has schemaFromMetadata()
Loading

File-Level Changes

Change Details Files
Define and expose identifier_fields as a table property
  • Add IDENTIFIER_FIELDS_PROPERTY constant
  • Include property in SUPPORTED_PROPERTIES
  • Register property metadata for identifier_fields
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Integrate identifier_fields in metadata and alter workflows
  • Include property in new table layout parsing
  • Handle identifier_fields in setTableProperties with updateIdentifierFields
  • Implement updateIdentifierFields with existence and optional checks
  • Pass identifier_fields into materialized view creation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractTrinoCatalog.java
Extend schema builder and property extraction for identifier_fields
  • Overload schemaFromMetadata to accept and validate identifierFields
  • Populate identifier_fields in IcebergUtil properties map
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Add end-to-end tests for identifier_fields behavior
  • Test table creation with single and multiple identifier_fields
  • Test altering, clearing, and invalid identifier_fields scenarios
  • Smoke-test support in BaseIcebergConnectorSmokeTest
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorSmokeTest.java
Document identifier_fields usage
  • Describe identifier_fields property and syntax in the Iceberg connector guide
docs/src/main/sphinx/connector/iceberg.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions github-actions bot added docs iceberg Iceberg connector labels Oct 21, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Rename the test method testCreateTableWithMutilIdentifierFields to correct the typo (MutilMulti).
  • In testCreateTableWithMultiIdentifierFields, the post-ALTER assertion still checks for the old 'id' field—update it to assert only the newly specified identifier fields.
  • Consider refactoring the field‐existence and optionality checks in updateIdentifierFields to reuse or share logic with existing property validation routines and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Rename the test method `testCreateTableWithMutilIdentifierFields` to correct the typo (`Mutil``Multi`).
- In `testCreateTableWithMultiIdentifierFields`, the post-ALTER assertion still checks for the old 'id' field—update it to assert only the newly specified identifier fields.
- Consider refactoring the field‐existence and optionality checks in `updateIdentifierFields` to reuse or share logic with existing property validation routines and reduce duplication.

## Individual Comments

### Comment 1
<location> `plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java:2497-2502` </location>
<code_context>
             }
         }

+        if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
+            @SuppressWarnings("unchecked")
+            List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
+                    .orElse(Collections.emptyList());
+            updateIdentifierFields(icebergTable, transaction, identifierFields);
+        }
+
</code_context>

<issue_to_address>
**suggestion:** Consider validating identifier fields for duplicates before updating.

The code does not verify that identifierFields contains unique field names. Adding this check will help prevent bugs from duplicate entries.

```suggestion
        if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
            @SuppressWarnings("unchecked")
            List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
                    .orElse(Collections.emptyList());
            // Validate identifierFields for duplicates
            Set<String> uniqueFields = new HashSet<>(identifierFields);
            if (uniqueFields.size() != identifierFields.size()) {
                throw new IllegalArgumentException("Duplicate identifier fields detected: " + identifierFields);
            }
            updateIdentifierFields(icebergTable, transaction, identifierFields);
        }
```
</issue_to_address>

### Comment 2
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java:1510` </location>
<code_context>
+    }
+
+    @Test
+    public void testCreateTableWithMutilIdentifierFields()
+    {
+        try (TestTable table = newTrinoTable("test_identifier_fields",
</code_context>

<issue_to_address>
**nitpick (typo):** Typo in test name: 'Mutil' should be 'Multi'.

Rename the test to 'testCreateTableWithMultiIdentifierFields' for consistency.

```suggestion
    public void testCreateTableWithMultiIdentifierFields()
```
</issue_to_address>

### Comment 3
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java:1520-1501` </location>
<code_context>
+            Set<String> identifierFieldNames = icebergTable.schema().identifierFieldNames();
+            assertThat(identifierFieldNames).containsExactlyInAnyOrder("id", "name");
+
+            assertUpdate("ALTER TABLE " + table.getName() + " SET PROPERTIES identifier_fields = ARRAY['name','area']");
+            icebergTable = loadTable(table.getName());
+            assertThat((String) computeScalar("SHOW CREATE TABLE " + table.getName()))
+                    .containsAnyOf("identifier_fields = ARRAY['name','area']", "identifier_fields = ARRAY['area', 'name']");
</code_context>

<issue_to_address>
**issue (testing):** Test assertion expects 'id' and 'area' as identifier fields after update, but the update sets 'name' and 'area'.

Please confirm which fields should be identifiers after the update and update the assertion to match.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +2497 to +2502
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
@SuppressWarnings("unchecked")
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
.orElse(Collections.emptyList());
updateIdentifierFields(icebergTable, transaction, identifierFields);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider validating identifier fields for duplicates before updating.

The code does not verify that identifierFields contains unique field names. Adding this check will help prevent bugs from duplicate entries.

Suggested change
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
@SuppressWarnings("unchecked")
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
.orElse(Collections.emptyList());
updateIdentifierFields(icebergTable, transaction, identifierFields);
}
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
@SuppressWarnings("unchecked")
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
.orElse(Collections.emptyList());
// Validate identifierFields for duplicates
Set<String> uniqueFields = new HashSet<>(identifierFields);
if (uniqueFields.size() != identifierFields.size()) {
throw new IllegalArgumentException("Duplicate identifier fields detected: " + identifierFields);
}
updateIdentifierFields(icebergTable, transaction, identifierFields);
}

}

@Test
public void testCreateTableWithMutilIdentifierFields()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Typo in test name: 'Mutil' should be 'Multi'.

Rename the test to 'testCreateTableWithMultiIdentifierFields' for consistency.

Suggested change
public void testCreateTableWithMutilIdentifierFields()
public void testCreateTableWithMultiIdentifierFields()

assertThat(identifierFieldNames).containsExactlyInAnyOrder("id");

assertUpdate("ALTER TABLE " + table.getName() + " SET PROPERTIES identifier_fields = ARRAY['name']");
icebergTable = loadTable(table.getName());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Test assertion expects 'id' and 'area' as identifier fields after update, but the update sets 'name' and 'area'.

Please confirm which fields should be identifiers after the update and update the assertion to match.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support this field in Iceberg connector because uniqueness isn't enforced. This is intentionally unsupported if I remember correctly.

What's the benefit of supporting this property?

@7hong
Copy link
Contributor Author

7hong commented Oct 21, 2025

I don't think we want to support this field in Iceberg connector because uniqueness isn't enforced. This is intentionally unsupported if I remember correctly.

What's the benefit of supporting this property?

Indeed, there is no need to enforce uniqueness in Trino, nor in the Iceberg specification. In my scenario, I use trino to create and manage the iceberg table. In addition to trino, we also use streaming writing such as flink. In this case, identifier-field-ids is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

Support Iceberg Identifier fields

2 participants