Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Deserialization Ignores Field initial-default and write-default Values #1431

Closed
2 of 3 tasks
paulcichonski opened this issue Dec 14, 2024 · 5 comments · Fixed by #1432
Closed
2 of 3 tasks

Schema Deserialization Ignores Field initial-default and write-default Values #1431

paulcichonski opened this issue Dec 14, 2024 · 5 comments · Fixed by #1432

Comments

@paulcichonski
Copy link
Contributor

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

I have a simple Iceberg schema that looks like:

{
  "type": "struct",
  "fields": [
    {
      "id": 1,
      "name": "foo",
      "type": "string",
      "required": false,
      "initial-default": "foo-initial-default",
      "write-default": "foo-write-default"
    }
  ],
  "schema-id": 1
}

If I attempt to deserialize this into a pyiceberg.schema.Schema instance, it ignores the initial-default and write-default attributes:

from pyiceberg.schema import Schema

with open('/tmp/simple_schema.json', 'r') as f:
    s = Schema.model_validate_json(f.read())

print(s.fields)  # prints (NestedField(field_id=1, name='foo', field_type=StringType(), required=False),)
print(s.fields[0].initial_default)  # prints None
print(s.fields[0].write_default)  # prints None

Perhaps this is expected, but looking at the code, it seems like it should work.

If this is indeed a bug (and not expected or user error), I can work on trying to figure out why it is happening and submit a fix.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@paulcichonski
Copy link
Contributor Author

This might be a simple fix, but I"m not exactly sure why the constructor is structured like it is, so not positive if it breaks some other usage pattern for NestedField:

diff --git a/pyiceberg/types.py b/pyiceberg/types.py
index 8fa7453..0a8c5e6 100644
--- a/pyiceberg/types.py
+++ b/pyiceberg/types.py
@@ -328,8 +328,8 @@ class NestedField(IcebergType):
         data["type"] = data["type"] if "type" in data else field_type
         data["required"] = required
         data["doc"] = doc
-        data["initial-default"] = initial_default
-        data["write-default"] = write_default
+        data["initial-default"] = data['initial-default'] if 'initial-default' in data else initial_default
+        data["write-default"] = data['write-default'] if 'write-default' in data else write_default
         super().__init__(**data)
 
     def __str__(self) -> str:

But if that seems good, let me know and I can PR it with tests.

@kevinjqliu
Copy link
Contributor

Hi @paulcichonski, thanks for reporting this issue! It does indeed seem like a bug, and your proposed fix looks reasonable. Would you be interested in contributing the fix and adding a test to verify the behavior?

@paulcichonski
Copy link
Contributor Author

@kevinjqliu sure thing, gave it a shot in #1432 .

@Fokko
Copy link
Contributor

Fokko commented Dec 15, 2024

Thanks for noticing and picking this up right away @paulcichonski 🙌 There is more context around this I'd like to share. Officially this is supported only in V3 and onward (that is not officially released, and not yet supported by PyIceberg). But having the initial values makes it much easier to parse the Iceberg metadata (eg. if a sequence number is missing, we can set an initial default to 0). That said, we have to open up this at some point, we just have to make sure that it is properly tested 👍

@paulcichonski
Copy link
Contributor Author

Makes sense, I was wondering if it was a v3 thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants