Skip to content

Conversation

@dttung2905
Copy link
Contributor

As outlined in the issue, currently we do not perform field id duplication check in this IndexNameByID method. This leads to the issue of last one wins if multiple field ID are presented in the schema

iceberg-go/schema.go

Lines 788 to 800 in f886a24

func IndexNameByID(schema *Schema) (map[int]string, error) {
indexer := &indexByName{
index: make(map[string]int),
shortNameId: make(map[string]int),
fieldNames: make([]string, 0),
shortFieldNames: make([]string, 0),
}
if _, err := Visit(schema, indexer); err != nil {
return nil, err
}
return indexer.ByID(), nil
}

iceberg-go/schema.go

Lines 810 to 817 in f886a24

func (i *indexByName) ByID() map[int]string {
idToName := make(map[int]string)
for k, v := range i.index {
idToName[v] = k
}
return idToName
}

I added a quick validation under ByID() and panic if duplicates are found and put this schema validation under init() as suggested in the issue

Fixes #593

Comment on lines 91 to 106
if _, err := IndexNameByID(s); err != nil {
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not have runtime panics that can be caused by user error, we should make the schema init fallible instead and return an err

NestedField{ID: 17, Name: "total_amount", Type: PrimitiveTypes.Float64, Required: true},
NestedField{ID: 18, Name: "congestion_surcharge", Type: PrimitiveTypes.Float64, Required: false},
NestedField{ID: 19, Name: "VendorID", Type: PrimitiveTypes.Int32, Required: false},
NestedField{ID: 19, Name: "vendor_id_alt", Type: PrimitiveTypes.Int32, Required: false},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the field name? Isn't this intended to test the case of multiple fields with the same name but different IDs?

Comment on lines 817 to 842
for name, id := range i.index {
if existingName, ok := idToName[id]; ok && existingName != name {
panic(fmt.Errorf("%w: multiple fields for id %d: %s and %s",
ErrInvalidSchema, id, existingName, name))
}
idToName[id] = name
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we forbid duplicate IDs in the first place so that we don't need to have this check at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me find another way to go around this . Are you saying we test it in the addField() method? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be a good spot yea

@dttung2905 dttung2905 force-pushed the add-validation-for-duplicated-field-id branch from 93f025f to f3616e4 Compare January 7, 2026 22:55
Signed-off-by: dttung2905 <[email protected]>
@dttung2905
Copy link
Contributor Author

This is ballooning into something more complicated than I thought initally. Let me find some time this weekend to look thoroughly into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: schema does not reject multiple field ids

3 participants