-
Notifications
You must be signed in to change notification settings - Fork 163
feat: dynamically create sf-hamilton-core package
#1376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this copy everything in sf-hamilton? Including the LICENSE file and NOTICE files? We need a few things to always be there for Apache purposes. If so, we're good I think. If not you'll need to add them.
| ### `pandas` and `numpy` dependencies | ||
| Hamilton was initially created for workflows that used `pandas` and `numpy` heavily. For this reason, `numpy` and `pandas` are imported at the top-level of module `hamilton.base`. Because of the package structure, as a Hamilton user, you're importing `pandas` and `numpy` every time you import `hamilton`. | ||
|
|
||
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory. | |
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and unsatisfactory. |
| from . import htypes, node | ||
| except ImportError: | ||
| import node | ||
| if TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment as to importance of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes:
- I moved the imports
pandas,numpyandpandas.core.indexes.extensionfrom the top-level to the code path that actually use these dependencies. There should be no behavior change, but it allows toimport hamilton.basewithout loading pandas each time. - if TYPE_CHECKING is the standard Python approach to import package that are only relevant for annotating signatures. For example,
mypywill importpandaswhen doing type checking, but doingfrom hamilton import basewon't importpandas pandasandnumpyare in the type checking block because they are used in some function signatures.pandas.core.indexes.extensionis not because it isn't used in type annotations.- moved
hamilton.nodetoTYPE_CHECKINGblock since it's only used for annotations - moved
htypesto top-level import; it should not have been in the atry/exceptin the first place because a code path ofSimplePythonDataFrameGraphAdapterdepends on it and will fail error ifhtypesisn't imported
I don't have the "why" for this code:
try:
from . import htypes, node
except ImportError:
import node- The
try/exceptwas introduced in 2022, but no clear indications why. - Looking at the source code of the file at the time, it was probably a brute force solution to avoid circular imports.
- The code could have been in a
TYPE_CHECKINGblock (introduced in Python 3.5) since it was only ever used for annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I meant in the code leave a note/comment as to the importance :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. this is for hamilton-core to work...
|
I don't know much about releases to pypi, can I ask if it possible to sign the releases using something like https://www.python.org/downloads/metadata/sigstore/ or something similar? If you sign the pypi release, could you sign it with the same key that you sign the ASF compliant source releases? The signing keys for source releases need to be maintained on secure hardware and generally, most ASF releases are done by running jobs on a private laptop and signed with GPG (or equivalent). |
|
@pjfanning I have little experience with pypi, but I want to highlight that we have multiple options for distribution:
Actually, you can even try I believe "hamilton-core" PR is a temporary solution to something we should fix on a major release. Users that care about this issue (already a few reacted positively on Slack) are probably ok with following a few extra steps for installation. |
|
@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release. |
Makes sense. Though, introducing |
I'm not too concerned. Tooling should make this simpler. Once we hit 2.0. we have options to stop I think. I'm not too worried -- and if we always set that expectation I think we'll be good. On that note, in the |
Adds the directory
hamilton-core/with mechanism to dynamically patchhamilton/and bundle a library namedsf-hamilton-corewhich could be pushed to pypi.It makes targeted 2 changes:
pandasandnumpyoptional dependencies; and removenetworkxdependency (currently unused).This makes the Hamilton package a much lighter install and solves long library loading time. See the file
hamilton-core/README.mdfor detailsChanges
hamilton-core/directoryhamilton-core/setup.pycopies the code ofhamilton/tohamilton-core/hamilton/_hamiltonhamilton-core/hamilton/__init__.pyis the entry point tosf-hamilton-core, which proxies everything directly to the source code ofhamiltonstored inhamilton-core/hamilton/_hamiltonhamilton/base.pyto lazily importpandasandnumpy. This shouldn't affect users in any wayHow I tested this
sf-hamilton-coreand runs Hamilton's unit testsTODO
hamilton-core/setup.pyand add linting + formattingChecklist