Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table_exists unit/integration test for NoSuchTableError #678

Merged
merged 5 commits into from
May 4, 2024

Conversation

MehulBatra
Copy link
Contributor

@MehulBatra MehulBatra commented Apr 30, 2024

  • Implementation of table_exists method at Initialization level already exists bypassing it.
  • Unit Test for glue, sqlcatalog and dynamodb.
  • Integration test for glue and dynamodb.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks for adding tests for different catalogs. I left a few minor comments

@@ -394,6 +394,11 @@ def table_exists(self, identifier: Union[str, Identifier]) -> bool:
Returns:
bool: True if the table exists, False otherwise.
"""
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of the other table_exists function in the same file?

def table_exists(self, identifier: Union[str, Identifier]) -> bool:
try:
self.load_table(identifier)
return True
except NoSuchTableError:
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about changing the REST catalog's table_exists implementation to be similar to this?
https://github.com/apache/iceberg-python/pull/512/files#diff-3bda7391ebd8aa3dcfd6703d8d2764830b9d9c35fa854188a37d69611274bd3dR722-R727

maybe calls super().table_exists()?

Copy link
Contributor

@HonahX HonahX May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better for RestCatalog to maintain a separate implementation to make a head request to /v1/{prefix}/namespaces/{namespace}/tables/{table}, ref: #512 (comment), #507 (comment)

The try-catch implementation is for other non-rest catalogs and thus it is put in the MetastoreCatalog instead of the Catalog interface.

A little bit more context regarding the MetastoreCatalog: while working on #498, we found that many helper functions as well as some implementations are for non-rest catalogs only. So we decide to move those to another layer, MetastoreCatalog to make the inheritance look better: #498 (comment).

Do these sound reasonable to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks for the detailed explanation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed the changes, as per the discussion, thank you, guys, for the feedback!

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think we've had the implementation of table_exists.

For non-rest catalog it is

def table_exists(self, identifier: Union[str, Identifier]) -> bool:
try:
self.load_table(identifier)
return True
except NoSuchTableError:
return False

as pointed out by @kevinjqliu

For rest catalog it is

def table_exists(self, identifier: Union[str, Identifier]) -> bool:
"""Check if a table exists.
Args:
identifier (str | Identifier): Table identifier.
Returns:
bool: True if the table exists, False otherwise.
"""
identifier_tuple = self.identifier_to_tuple_without_catalog(identifier)
response = self._session.head(
self.url(Endpoints.load_table, prefixed=True, **self._split_identifier_for_path(identifier_tuple))
)
return response.status_code == 200

But we do miss relevant tests for glue, sql, dynamodb, ...

pyiceberg/catalog/__init__.py Outdated Show resolved Hide resolved
tests/catalog/integration_test_dynamodb.py Outdated Show resolved Hide resolved
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! The lint issue can be resolved by executing make lint

@MehulBatra
Copy link
Contributor Author

Overall LGTM! The lint issue can be resolved by executing make lint

Thank you @HonahX and @kevinjqliu for all the feedback!

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! This will be good to go when the CI pass.

@kevinjqliu
Copy link
Contributor

Thank you for the PR! Do you mind changing the PR title and description to match?

@MehulBatra MehulBatra changed the title Implement table_exists method with try-catch for NoSuchTableError table_exists unit/integration test for NoSuchTableError May 3, 2024
@HonahX HonahX merged commit 7bd5d9e into apache:main May 4, 2024
7 checks passed
@HonahX
Copy link
Contributor

HonahX commented May 4, 2024

Thanks everyone! Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants