-
Notifications
You must be signed in to change notification settings - Fork 52
Design doc for refactoring python API namespaces #1179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Design doc for refactoring python API namespaces #1179
Conversation
| - `icechunk.Repository` (also available in `icechunk.repository.Repository`) | ||
| - `icechunk.IcechunkStore` (also available in `icechunk.store.IcechunkStore`) | ||
| - `icechunk.spec_version` | ||
| - `icechunk.print_debug_info` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_logs_filter should probably be at the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should initialize_logs also go in the top-level?
| - `conflict.VersionSelection` | ||
| - `exceptions` | ||
| - `exceptions.ConflictError` | ||
| - `exceptions.IcechunkError` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe IcechunkError at the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I expect ConflictError to be used more regularly so it should be top-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either all exceptions should be top-level or none of them
| - `logging` | ||
| - `logging.initialize_logs`, | ||
| - `logging.set_logs_filter`, | ||
| - `manifests` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these should probably go in the config module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the manifest splitting classes? They also seem like they should be grouped together though... They could actually go even deeper, i.e. icechunk.config.manifests.ManifestConfig
| - If we only kept `Storage` we could get rid of a _lot_ of functions in favour of methods... | ||
| - Alternatively should base classes that aren't really meant to be called by the user (e.g. `Storage`) be public API at all? | ||
| - What's the difference between `storage.local_filesystem_storage` and `storage.local_filesystem_store`? | ||
| - Similarly how can we better disambiguate `storage.s3_storage` and `storage.s3_store`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need better names. Storage is what you pass to create/open repos. s3_store is a way to indicate the target of a VirtualChunkContainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yeah that's really confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sketched an alternative proposal in #1182
| - Alternatively should base classes that aren't really meant to be called by the user (e.g. `Storage`) be public API at all? | ||
| - What's the difference between `storage.local_filesystem_storage` and `storage.local_filesystem_store`? | ||
| - Similarly how can we better disambiguate `storage.s3_storage` and `storage.s3_store`? | ||
| - Is there a similar question for credentials? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credentials are currently a mess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These deserve their own separate issue. The class hierarchy is complicated so it's not immediately obvious to me how to simplify the API.
| - Similarly should `S3Options` really be `S3Config`? | ||
| - Should summary dataclasses such as `Diff` and `GCSummary` live in the top-level? | ||
| - Is `distributed` meant to be a public module at all? Why is it's API defined in a different place? Should it be merged with `dask`? | ||
| - Union types like `AnyCredential` are re-exported as top-level API for importing, but it seems like they are only useful for internal type-hinting purposes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for typing, but not sure if they are needed publicly or only privately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are only needed for typing, and you don't expect users to make great use of them for typing, I think they should be private.
|
|
||
| 2. **Move everything into desired locations whilst keeping all top-level imports** | ||
|
|
||
| We should add tests that both import paths work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, particularly we need to add tests that verify we are not breaking user code
| User experience should look like this: | ||
|
|
||
| ```python | ||
| # works, but emits a warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use Icechunk 2 as the moment to make the change and start warning people about deprecation.
|
@paraseba there are still a few unresolved question in this design doc, when you have a moment. |
design-docs/012-python-namespaces
Outdated
| - `distributed.extract_sessions` | ||
| - `distributed.merge_sessions` | ||
| - `garbage` | ||
| - `garbage.GCSummary` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such module garbage in icechunk!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes you're right. Let's make one then. I'll edit this design doc to move garbage into another section.
| Existing namespaces, with new members added: | ||
|
|
||
| - `credentials` | ||
| - `credentials.Credentials` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per this doc, the classes and functions from the existing sub-modules would be exported from the module itself using __all__ variable instead of the top-level icechunk package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we actually want both: exported from the module itself, and then re-imported from the sub-module into the top-level __init__.py. That way your initial PR doesn't change the user experience, but does define a clearer underlying structure of sub-modules. So your current PR is for step 2 as described at the end of this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for now only add the new namespaces to icechunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have this:
# __init__.py
from icechunk._icechunk_python import Credentials
__all__ = [
"Credentials",
]# icechunk.icechunk_python.pyi
class Credentials
...Which only allows you to import via from icechunk import Credentials.
What we want is this:
# __init__.py
from icechunk.credentials import Credentials
__all__ = [
"Credentials",
]# icechunk.credentials.py
from icechunk._icechunk_python import Credentials
__all__ = [
"Credentials",
]# icechunk_python.pyi
class Credentials
...This would enable importing via from icechunk.credentials import Credentials (i.e. the end state we want) whilst for now still allowing importing via from icechunk import Credentials (i.e. backwards compatibility), and requires no changes to the part of the python code that interacts with rust (i.e. separates API from implementation).
Towards #1172