Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1191:Added docstrings to the pyiceberg/table/inspect.py file #1533

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gayatrikate04
Copy link

@gayatrikate04 gayatrikate04 commented Jan 17, 2025

Closes #1191

Added detailed docstrings to the pyiceberg/table/inspect.py file to improve documentation and code clarity. The updates enhance readability and help developers understand the functionality of the InspectTable class and its methods.

Comment on lines 11 to 13
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid changing the LICENSE, the linebreak is now also a bit awkward

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few comments here, thanks for the PR

pyiceberg/table/inspect.py Show resolved Hide resolved
Comment on lines 271 to 273
"""shot ID,
and optional configuration parameters.
Retrieve references from the Iceberg table metadata as a PyArrow Table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cut off

Retrieve partition information from the Iceberg table as a PyArrow Table.

Args:
snapshot_id (Optional[int]): The snapshot ID to filter partitions. If not provided, all partitions are included.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not provided, all partitions are included.

I dont think this is right, the snapshot_id for all of these functions are used to time travel to a specific snapshot. otherwise the current snapshot will be used

Comment on lines 417 to 424
import pyarrow as pa
from typing import Optional, List, Dict, Any, Set
from datetime import datetime, timezone
from pyiceberg.table.snapshots import MetadataLogEntry
from pyiceberg.io.pyarrow import schema_to_pyarrow
from pyiceberg.table import Snapshot, ManifestContent, DataFileContent, PartitionSpec
from pyiceberg.utils import from_bytes
from executor_factory import ExecutorFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports should be on top of the file

from pyiceberg.utils import from_bytes
from executor_factory import ExecutorFactory

class IcebergTableUtils:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class IcebergTableUtils:

i dont think we need a new class here

@gayatrikate04
Copy link
Author

"Thank you for the detailed feedback! I'll make the following updates:
Fix the cutoff docstring to provide a complete explanation.
Update the description of snapshot_id to accurately reflect its role in time traveling to specific snapshots.
Move the imports to the top of the file to align with standard practices.
Remove the unnecessary class or refactor it if required.
I'll address these issues and push the changes .

@gayatrikate04
Copy link
Author

I have pushed the latest changes, including refinements to the docstrings and updates to related files. Please review the updates.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I've added some comments, i think theres a linter error, could you run make lint locally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of pyenv's local config, should not be checked into the repo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I will remove .python-version from the PR.

@@ -30,7 +30,8 @@
- [Verify a release](verify-release.md)
- [How to release](how-to-release.md)
- [Release Notes](https://github.com/apache/iceberg-python/releases)
- [Code Reference](reference/)
- [Code Reference](reference/pyiceberg/index.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this changed? i dont think this is necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the change from the SUMMARY.md file if it's not necessary. Thanks for pointing it out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase this PR against main to get the latest change from #1538?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I’m not very familiar with rebasing yet, but I’m eager to learn. Could you guide me on how to properly rebase this PR against the main branch, or would merging the latest changes from #1538 be a better approach in this case? I want to ensure I’m following the best practices.

Comment on lines +34 to +35
paths:
- pyiceberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this, what is this change for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out! I made this change while addressing errors in inspect.py. However, I’ll review it again to ensure it’s necessary and aligned with the overall structure. Please let me know if you have additional context or suggestions.

@@ -28,12 +29,24 @@
from pyiceberg.utils.singleton import _convert_to_hashable_type

if TYPE_CHECKING:
import pyarrow as pa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we still need this here

@@ -57,7 +87,21 @@ def _get_snapshot(self, snapshot_id: Optional[int] = None) -> Snapshot:
raise ValueError("Cannot get a snapshot as the table does not have any.")

def snapshots(self) -> "pa.Table":
import pyarrow as pa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we actually need this here, in case someone imports this function directly

from pyiceberg.table.inspect import snapshots

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. I will keep the import for pyarrow as suggested, in case the function is used directly.

Comment on lines +147 to +149
Args:
snapshot_id (Optional[int]): The ID of the snapshot to retrieve entries for.
If None, entries for the current snapshot are returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this description of how snapshot_id is used, can we apply this for all similar functions
perhaps something more generic

Suggested change
Args:
snapshot_id (Optional[int]): The ID of the snapshot to retrieve entries for.
If None, entries for the current snapshot are returned.
Args:
snapshot_id (Optional[int]): The ID of the snapshot to retrieve. If None, the current snapshot is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I’m glad you liked the description. I’ll review other similar functions and update their docstrings to use a more generic and consistent phrasing as suggested. Let me know if there are any additional improvements you'd like to see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Docstrings to pyiceberg/table/inspect.py
3 participants