-
Notifications
You must be signed in to change notification settings - Fork 139
fix(schema): Add duplicate field ID checking mechanism #658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(schema): Add duplicate field ID checking mechanism #658
Conversation
| if _, err := IndexNameByID(s); err != nil { | ||
| panic(err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not have runtime panics that can be caused by user error, we should make the schema init fallible instead and return an err
| NestedField{ID: 17, Name: "total_amount", Type: PrimitiveTypes.Float64, Required: true}, | ||
| NestedField{ID: 18, Name: "congestion_surcharge", Type: PrimitiveTypes.Float64, Required: false}, | ||
| NestedField{ID: 19, Name: "VendorID", Type: PrimitiveTypes.Int32, Required: false}, | ||
| NestedField{ID: 19, Name: "vendor_id_alt", Type: PrimitiveTypes.Int32, Required: false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the field name? Isn't this intended to test the case of multiple fields with the same name but different IDs?
| for name, id := range i.index { | ||
| if existingName, ok := idToName[id]; ok && existingName != name { | ||
| panic(fmt.Errorf("%w: multiple fields for id %d: %s and %s", | ||
| ErrInvalidSchema, id, existingName, name)) | ||
| } | ||
| idToName[id] = name | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we forbid duplicate IDs in the first place so that we don't need to have this check at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me find another way to go around this . Are you saying we test it in the addField() method? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be a good spot yea
Signed-off-by: dttung2905 <[email protected]>
Signed-off-by: dttung2905 <[email protected]>
93f025f to
f3616e4
Compare
Signed-off-by: dttung2905 <[email protected]>
|
This is ballooning into something more complicated than I thought initally. Let me find some time this weekend to look thoroughly into this |
As outlined in the issue, currently we do not perform field id duplication check in this
IndexNameByIDmethod. This leads to the issue of last one wins if multiple field ID are presented in the schemaiceberg-go/schema.go
Lines 788 to 800 in f886a24
iceberg-go/schema.go
Lines 810 to 817 in f886a24
I added a quick validation under
ByID()and panic if duplicates are found and put this schema validation underinit()as suggested in the issueFixes #593