Skip to content

Conversation

@zilto
Copy link
Contributor

@zilto zilto commented Sep 6, 2025

Adds the directory hamilton-core/ with mechanism to dynamically patch hamilton/ and bundle a library named sf-hamilton-core which could be pushed to pypi.

It makes targeted 2 changes:

  • disable plugin autoloading
  • make pandas and numpy optional dependencies; and remove networkx dependency (currently unused).

This makes the Hamilton package a much lighter install and solves long library loading time. See the file hamilton-core/README.md for details

Changes

  • Add hamilton-core/ directory
  • hamilton-core/setup.py copies the code of hamilton/ to hamilton-core/hamilton/_hamilton
  • hamilton-core/hamilton/__init__.py is the entry point to sf-hamilton-core, which proxies everything directly to the source code of hamilton stored in hamilton-core/hamilton/_hamilton
  • modify hamilton/base.py to lazily import pandas and numpy. This shouldn't affect users in any way

How I tested this

  • successfully ran all core unit tests locally
  • added CI workflow that installs sf-hamilton-core and runs Hamilton's unit tests

TODO

  • remove networkx dependency
  • add info to README about how it works
  • clean up hamilton-core/setup.py and add linting + formatting
  • potentially add docs pages

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@zilto zilto self-assigned this Sep 6, 2025
@zilto zilto added the core-work Work that is "core". Likely overseen by core team in most cases. label Sep 6, 2025
@zilto zilto requested review from elijahbenizzy and skrawcz and removed request for elijahbenizzy September 6, 2025 01:24
Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this copy everything in sf-hamilton? Including the LICENSE file and NOTICE files? We need a few things to always be there for Apache purposes. If so, we're good I think. If not you'll need to add them.

### `pandas` and `numpy` dependencies
Hamilton was initially created for workflows that used `pandas` and `numpy` heavily. For this reason, `numpy` and `pandas` are imported at the top-level of module `hamilton.base`. Because of the package structure, as a Hamilton user, you're importing `pandas` and `numpy` every time you import `hamilton`.

A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory.
A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and unsatisfactory.

from . import htypes, node
except ImportError:
import node
if TYPE_CHECKING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment as to importance of this

Copy link
Contributor Author

@zilto zilto Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes:

  • I moved the imports pandas, numpy and pandas.core.indexes.extension from the top-level to the code path that actually use these dependencies. There should be no behavior change, but it allows to import hamilton.base without loading pandas each time.
  • if TYPE_CHECKING is the standard Python approach to import package that are only relevant for annotating signatures. For example, mypy will import pandas when doing type checking, but doing from hamilton import base won't import pandas
  • pandas and numpy are in the type checking block because they are used in some function signatures. pandas.core.indexes.extension is not because it isn't used in type annotations.
  • moved hamilton.node to TYPE_CHECKING block since it's only used for annotations
  • moved htypes to top-level import; it should not have been in the a try/except in the first place because a code path of SimplePythonDataFrameGraphAdapter depends on it and will fail error if htypes isn't imported

I don't have the "why" for this code:

try:
  from . import htypes, node
except ImportError:
  import node
  • The try/except was introduced in 2022, but no clear indications why.
  • Looking at the source code of the file at the time, it was probably a brute force solution to avoid circular imports.
  • The code could have been in a TYPE_CHECKING block (introduced in Python 3.5) since it was only ever used for annotations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I meant in the code leave a note/comment as to the importance :)

Copy link
Contributor

@skrawcz skrawcz Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. this is for hamilton-core to work...

@pjfanning
Copy link
Member

pjfanning commented Sep 6, 2025

I don't know much about releases to pypi, can I ask if it possible to sign the releases using something like https://www.python.org/downloads/metadata/sigstore/ or something similar?

If you sign the pypi release, could you sign it with the same key that you sign the ASF compliant source releases?

The signing keys for source releases need to be maintained on secure hardware and generally, most ASF releases are done by running jobs on a private laptop and signed with GPG (or equivalent).
https://infra.apache.org/release-signing.html

@zilto
Copy link
Contributor Author

zilto commented Sep 7, 2025

@pjfanning I have little experience with pypi, but I want to highlight that we have multiple options for distribution:

  • pip allows you to install from github pip install git+https://github.com/apache/hamilton.git
  • @ allows to specify tag or brancehs. We could have pip install git+https://github.com/apache/hamilton.git@core
  • you can also specify subpaths. We could have pip install git+https://github.com/apache/hamilton.git#subdirectory=hamilton-core
  • the GitHub release workflow could include core versions in the release Assets
  • These install instructions are valid in requirements.txt or pyproject.toml

Actually, you can even try hamilton-core right now from this PR via

pip install git+https://github.com/apache/hamilton.git@feat/hamilton-core#subdirectory=hamilton-core

I believe "hamilton-core" PR is a temporary solution to something we should fix on a major release. Users that care about this issue (already a few reacted positively on Slack) are probably ok with following a few extra steps for installation.

@pjfanning
Copy link
Member

@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release.

@zilto
Copy link
Contributor Author

zilto commented Sep 7, 2025

@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release.

Makes sense. Though, introducing sf-hamilton-core and having people depend on it means we'll have to maintain that pypi target "forever". If the fixes included in sf-hamilton-core are solved in Hamilton 2.0.0, doing pip install sf-hamilton or pip install sf-hamilton-core would do exactly the same thing, which could be confusing

@skrawcz
Copy link
Contributor

skrawcz commented Sep 8, 2025

we'll have to maintain that pypi target "forever"

I'm not too concerned. Tooling should make this simpler. Once we hit 2.0. we have options to stop I think. I'm not too worried -- and if we always set that expectation I think we'll be good.

On that note, in the __init__.py can you log a warning that sf-hamilton-core will go away in 2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-work Work that is "core". Likely overseen by core team in most cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants