-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Arrow schema conversion #117
Conversation
We wrapped a schema in a schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits
tests/io/test_pyarrow.py
Outdated
@@ -708,15 +709,17 @@ def _write_table_to_file(filepath: str, schema: pa.Schema, table: pa.Table) -> s | |||
|
|||
@pytest.fixture | |||
def file_int(schema_int: Schema, tmpdir: str) -> str: | |||
pyarrow_schema = pa.schema(schema_to_pyarrow(schema_int), metadata={"iceberg.schema": schema_int.model_dump_json()}) | |||
pyarrow_schema = schema_to_pyarrow(schema_int, metadata={ICEBERG_SCHEMA: bytes(schema_int.model_dump_json(), 'utf-8')}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this string be using a constant in a lib somewhere? Or at we could least create an encodings class that centralizes all the schema stuff (e.g. create a constant for 'utf-8'
, hides ICEBERG_SCHEMA
and expose some cleaner methods that hides the bytes conversion, etc...
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've introduced a utf8 constant 👍
def schema_to_pyarrow(schema: Union[Schema, IcebergType]) -> pa.schema: | ||
return visit(schema, _ConvertToArrowSchema()) | ||
def schema_to_pyarrow(schema: Union[Schema, IcebergType], metadata: Dict[bytes, bytes] = EMPTY_DICT) -> pa.schema: | ||
return visit(schema, _ConvertToArrowSchema(metadata)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the visit()
behavior with an empty dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometime we use the visitor to convert types, then we don't need to set any metadata so then a default with an empty dict makes things easier and less verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @bitsondatadev and @amogh-jahagirdar for the review 🥳 |
We wrapped a schema in a schema.