-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐞] Collection of a few bugs #864
Comments
@kevinjqliu, I've just created a PR for the second item on the list. |
@kevinjqliu created a PR to fix 3rd item on the list |
Hi @kevinjqliu. Could you please elaborate more on this? I am curious of the corresponding use-cases that are affected by default values of these fields. I remembered I also encountered similar difficulties when working on #498 and I ended up with a workaround to let |
@HonahX Ok so I've found a way to reproduce this. Requires a bit of a setup. SetupHave pyiceberg's integration test docker running
Pull the
It requires an on-disk sqlite db located in
Reset the
Run Spark testHave fastapi server running in one terminal.
In another terminal, run the spark test
The fastapi server will error, sending HTTP 500 request. Here's the stack trace
From here, you can add breakpoints to the codebase in Here are the fixes that I added to make the spark test work |
I dont think the REST server API is at fault here since this request corresponds with the
And since I assume Spark is sending the right request, it has to be an issue downstream of |
Hi @kevinjqliu - I just added a test case similar to yours and verified that it works with the tabular rest catalog image in the test suite. Could we double check if this is an issue in the Rest catalog implementation? Branch with test: https://github.com/syun64/iceberg-python/tree/rest-integration The test is: I would appreciate you double checking my work as well to make sure it correctly replicates your test! |
Thanks for taking a look at this! I reran the steps above; the issue is actually with the I verified that all Pyspark statements work with the tabular REST catalog. https://github.com/kevinjqliu/iceberg-rest-catalog/blob/7c5548133ae266d4fac215b063911c35f08461d9/tests/pyspark_test.py#L75-L116 To reproduce, add this test to
The issue is with this specific Rest catalog implementation. However, the implementation uses the Pyiceberg library to process catalog requests. I believe the issue is with the |
I think this is related to the As @kevinjqliu mentioned in the issue description
This causes problem in the iceberg-python/pyiceberg/catalog/__init__.py Lines 913 to 924 in 846fc08
and make some special note to some of iceberg-python/pyiceberg/table/__init__.py Lines 723 to 725 in fd2af56
such that However, this won't work in the case of RestCatalog server implementation, where the So, I agree with @kevinjqliu that there is some room for improvement for |
Thanks for the pointer! Looks like this is from the interactions between ExampleI was able to capture the request objects that Spark sends to the REST catalog server. For
|
For context, this is the
|
Also, this is what
Interestingly, it's initialized with |
Opened #950, I was able to remove the I think we need more robust testing for the table update visitor functions. I also noticed that I think we should look at |
Apache Iceberg version
None
Please describe the bug 🐞
While working on a Python REST server implementation (https://github.com/kevinjqliu/iceberg-rest-catalog), I noticed a few Pyiceberg bugs while integrating with PySpark. The REST server implementation is using SqlCatalog as the underlying catalog.
Here's the collection of bugs that I would like to upstream (main...kevinjqliu:iceberg-python:kevinjqliu/iceberg-rest-catalog)
To summarize:
rest.py
'sdrop_namespace
function is missing handling the409
status code. (Handle error when trying to drop a non-empty namespace on rest.py #868)sql.py
'sdrop_namespace
function does not returnNoSuchNamespaceError
when namespace is missing. (Raise error on nonexistent namespace on sql.py #865)sql.py
'supdate_namespace_properties
function only updates the first property and drops the rest. (fix: update all namespace properties in update_namespace_properties #873)TableMetadata
is initialized with the default Pydantic object forschema
,partition_spec
, andsort_order
, which does not play well with table updates.I'll be happy to work on this but I think these are "good first issues". We can break these up into separate PRs with the relevant tests.
The text was updated successfully, but these errors were encountered: