-
Notifications
You must be signed in to change notification settings - Fork 918
Relax dict_id
equality in field merging
#6356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#5981 is possibly related. As it stands I am not sure we can ignore the dict ID when merging, as unlike with nulls, two arrays with different dictionary IDs can't be "merged", as there is no covering relationship between two different dict IDs. |
Isn't that just an optimization though? I understand that if two dict IDs are identical one can just append the index arrays to each other safely, and if not then the dicts needs to be unified and indexes added transposed (using the terminology that the Go arrow package uses not sure it's universal it's just the library I have the most experience with). |
Unfortunately that isn't how the IPC implementation treats them, in particular IPC files use dictionary IDs as unique identifiers to immutable dictionaries. It is all a bit confused, and it is peculiar that they're part of the schema definition at all, but that is what #5981 concerns. |
If I understand this right then, it should be possible to remove |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Trying to merge two schemas that are compatible, other than having fields whose
dict_id
s are different.Describe the solution you'd like
Relax
dict_id
handling during merging of fields, such that if they are different thedict_id
is unset, and only if they are equal are they kept.Describe alternatives you've considered
Before opening this, I tried implementing my own merge function, but would have to change it quite significantly compared to the existing code, since I can't access internal fields. I could still make this work, but wanted to first check whether the project would be open to relaxing this, since this feels more aligned with how merging works otherwise (eg. making a field nullable).
Additional context
All of this is under the assumption that I understand
dict_id
correctly, which is that it's primarily used for IPC's ability to duplicate dicts.The text was updated successfully, but these errors were encountered: