-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support Iceberg Identifier fields #27026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR introduces full support for the Iceberg table property identifier_fields by defining the property, wiring it through schema creation and commit paths, validating values, exposing it in SHOW CREATE TABLE, and adding end-to-end tests and documentation. Entity relationship diagram for Iceberg table properties including identifier_fieldserDiagram
ICEBERG_TABLE ||--o| COLUMN : has
ICEBERG_TABLE ||--o| IDENTIFIER_FIELD : has
ICEBERG_TABLE {
string name
list identifier_fields
}
IDENTIFIER_FIELD {
string name
int field_id
}
COLUMN {
string name
string type
bool is_optional
int field_id
}
Class diagram for Iceberg table property handling with identifier_fieldsclassDiagram
class IcebergTableProperties {
+IDENTIFIER_FIELDS_PROPERTY : String
+getIdentifierFields(Map<String, Object>) : List<String>
}
class IcebergMetadata {
+setTableProperties(...)
+updateIdentifierFields(Table, Transaction, List<String>)
}
class IcebergUtil {
+schemaFromMetadata(List<ColumnMetadata>, List<String>) : Schema
}
IcebergTableProperties <.. IcebergMetadata : uses
IcebergTableProperties <.. IcebergUtil : uses
IcebergMetadata <.. IcebergUtil : uses
IcebergMetadata o-- IcebergTableProperties : has IDENTIFIER_FIELDS_PROPERTY
IcebergMetadata o-- IcebergMetadata : has updateIdentifierFields()
IcebergUtil o-- IcebergUtil : has schemaFromMetadata()
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Rename the test method
testCreateTableWithMutilIdentifierFields
to correct the typo (Mutil
→Multi
). - In
testCreateTableWithMultiIdentifierFields
, the post-ALTER assertion still checks for the old 'id' field—update it to assert only the newly specified identifier fields. - Consider refactoring the field‐existence and optionality checks in
updateIdentifierFields
to reuse or share logic with existing property validation routines and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Rename the test method `testCreateTableWithMutilIdentifierFields` to correct the typo (`Mutil` → `Multi`).
- In `testCreateTableWithMultiIdentifierFields`, the post-ALTER assertion still checks for the old 'id' field—update it to assert only the newly specified identifier fields.
- Consider refactoring the field‐existence and optionality checks in `updateIdentifierFields` to reuse or share logic with existing property validation routines and reduce duplication.
## Individual Comments
### Comment 1
<location> `plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java:2497-2502` </location>
<code_context>
}
}
+ if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
+ @SuppressWarnings("unchecked")
+ List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
+ .orElse(Collections.emptyList());
+ updateIdentifierFields(icebergTable, transaction, identifierFields);
+ }
+
</code_context>
<issue_to_address>
**suggestion:** Consider validating identifier fields for duplicates before updating.
The code does not verify that identifierFields contains unique field names. Adding this check will help prevent bugs from duplicate entries.
```suggestion
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) {
@SuppressWarnings("unchecked")
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY)
.orElse(Collections.emptyList());
// Validate identifierFields for duplicates
Set<String> uniqueFields = new HashSet<>(identifierFields);
if (uniqueFields.size() != identifierFields.size()) {
throw new IllegalArgumentException("Duplicate identifier fields detected: " + identifierFields);
}
updateIdentifierFields(icebergTable, transaction, identifierFields);
}
```
</issue_to_address>
### Comment 2
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java:1510` </location>
<code_context>
+ }
+
+ @Test
+ public void testCreateTableWithMutilIdentifierFields()
+ {
+ try (TestTable table = newTrinoTable("test_identifier_fields",
</code_context>
<issue_to_address>
**nitpick (typo):** Typo in test name: 'Mutil' should be 'Multi'.
Rename the test to 'testCreateTableWithMultiIdentifierFields' for consistency.
```suggestion
public void testCreateTableWithMultiIdentifierFields()
```
</issue_to_address>
### Comment 3
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java:1520-1501` </location>
<code_context>
+ Set<String> identifierFieldNames = icebergTable.schema().identifierFieldNames();
+ assertThat(identifierFieldNames).containsExactlyInAnyOrder("id", "name");
+
+ assertUpdate("ALTER TABLE " + table.getName() + " SET PROPERTIES identifier_fields = ARRAY['name','area']");
+ icebergTable = loadTable(table.getName());
+ assertThat((String) computeScalar("SHOW CREATE TABLE " + table.getName()))
+ .containsAnyOf("identifier_fields = ARRAY['name','area']", "identifier_fields = ARRAY['area', 'name']");
</code_context>
<issue_to_address>
**issue (testing):** Test assertion expects 'id' and 'area' as identifier fields after update, but the update sets 'name' and 'area'.
Please confirm which fields should be identifiers after the update and update the assertion to match.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) { | ||
@SuppressWarnings("unchecked") | ||
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY) | ||
.orElse(Collections.emptyList()); | ||
updateIdentifierFields(icebergTable, transaction, identifierFields); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider validating identifier fields for duplicates before updating.
The code does not verify that identifierFields contains unique field names. Adding this check will help prevent bugs from duplicate entries.
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) { | |
@SuppressWarnings("unchecked") | |
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY) | |
.orElse(Collections.emptyList()); | |
updateIdentifierFields(icebergTable, transaction, identifierFields); | |
} | |
if (properties.containsKey(IDENTIFIER_FIELDS_PROPERTY)) { | |
@SuppressWarnings("unchecked") | |
List<String> identifierFields = (List<String>) properties.get(IDENTIFIER_FIELDS_PROPERTY) | |
.orElse(Collections.emptyList()); | |
// Validate identifierFields for duplicates | |
Set<String> uniqueFields = new HashSet<>(identifierFields); | |
if (uniqueFields.size() != identifierFields.size()) { | |
throw new IllegalArgumentException("Duplicate identifier fields detected: " + identifierFields); | |
} | |
updateIdentifierFields(icebergTable, transaction, identifierFields); | |
} |
} | ||
|
||
@Test | ||
public void testCreateTableWithMutilIdentifierFields() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (typo): Typo in test name: 'Mutil' should be 'Multi'.
Rename the test to 'testCreateTableWithMultiIdentifierFields' for consistency.
public void testCreateTableWithMutilIdentifierFields() | |
public void testCreateTableWithMultiIdentifierFields() |
assertThat(identifierFieldNames).containsExactlyInAnyOrder("id"); | ||
|
||
assertUpdate("ALTER TABLE " + table.getName() + " SET PROPERTIES identifier_fields = ARRAY['name']"); | ||
icebergTable = loadTable(table.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (testing): Test assertion expects 'id' and 'area' as identifier fields after update, but the update sets 'name' and 'area'.
Please confirm which fields should be identifiers after the update and update the assertion to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support this field in Iceberg connector because uniqueness isn't enforced. This is intentionally unsupported if I remember correctly.
What's the benefit of supporting this property?
Indeed, there is no need to enforce uniqueness in Trino, nor in the Iceberg specification. In my scenario, I use trino to create and manage the iceberg table. In addition to trino, we also use streaming writing such as flink. In this case, identifier-field-ids is necessary. |
Description
fixed #23563
Supports specifying the
identifier_fields
attribute when creating an iceberg table, and can also modify or remove theidentifier_fields
attribute in the Alter table statement.Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( x) Release notes are required, with the following suggested text:
Summary by Sourcery
Add support for the identifier_fields table property in Iceberg tables, allowing users to specify, modify, and remove identifier fields via CREATE TABLE WITH and ALTER TABLE, enforcing validation for field existence and non-nullability, and propagating identifier_fields through metadata operations, schema creation (including materialized views), and SHOW CREATE TABLE.
New Features:
Enhancements:
Documentation:
Tests: