Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-fix/Improvements for Local Development Setup and Unit Test Structure #1168

Merged
merged 13 commits into from
Jan 30, 2025

Conversation

mhmotamedi
Copy link
Contributor

@mhmotamedi mhmotamedi commented Jan 25, 2025

PR Description:

Improvements for Local Development Setup and Unit Test Structure

This PR focuses on resolving issues with local environment setup and unit test execution in the dataprofiler library. The changes ensure a more reliable setup process, eliminate dependency conflicts, and enhance the performance and maintainability of unit tests. It also involves fixing Pre-commit Hook Errors.

Changes Introduced

1. Refactored Makefile

  • Dependency Installation Order: Moved requirements-ml.txt earlier in the installation sequence to prevent conflicts with TensorFlow and shared dependencies like NumPy.

  • Fresh Dependency Downloads: Added --no-cache-dir to pip install commands to ensure fresh downloads, avoiding issues caused by stale or corrupted cached files.

  • Virtual Environment Isolation: Updated the test command to explicitly use venv/bin/python, ensuring tests use the correct Python interpreter and dependencies from the virtual environment.

  • Python Version Check: Introduced a check-python target to ensure the specified Python version is 3.9, 3.10, or 3.11. If the Python version does not match, the Makefile exits with a clear error message, improving reliability across environments.

  • Help Section: Added a help target to display all available Makefile commands with descriptions. This makes the development workflow more transparent and easier to understand for new and existing contributors.

2. Refactored Unit Tests

  • Improved Test Isolation: Replaced global variables with setUpClass and tearDownClass, ensuring proper initialization and cleanup of shared resources within each test suite. (dataprofiler/tests/profilers/test_profile_builder.py and dataprofiler/tests/test_data_profiler.py)
  • Faster Test Execution: Centralized resource setup to reduce redundant operations, improving test performance without relying on the --forked flag, which can be slow for local runs.

3. Fix Pre-commit Hook Errors and Finalize Configurations:

  • upgraded mypy version to resolve the following error:

58
/home/runner/.cache/pre-commit/repo5__fa9id/py_env-python3/lib/python3.10/site-packages/dask/dataframe/dask_expr/_backends.py:129: error: Condition can't be inferred, unable to merge overloads
59
/home/runner/.cache/pre-commit/repo5__fa9id/py_env-python3/lib/python3.10/site-packages/dask/dataframe/dask_expr/_backends.py:136: error: Condition can't be inferred, unable to merge overloads

  • refactored dataprofiler/reports/graphs.py module to resolve mypy errors raised after upgrade:

dataprofiler/reports/graphs.py:112: error: Return value expected [return-value]
dataprofiler/reports/graphs.py:183: error: Item "None" of "Optional[Axes]" has no attribute "set" [union-attr]
dataprofiler/reports/graphs.py:186: error: Item "None" of "Optional[Axes]" has no attribute "set_title" [union-attr]
dataprofiler/reports/graphs.py:187: error: Incompatible return value type (got "Optional[Axes]", expected "Axes") [return-value]
dataprofiler/reports/graphs.py:195: error: Function "matplotlib.pyplot.figure" is not valid as a type [valid-type]
dataprofiler/reports/graphs.py:195: note: Perhaps you need "Callable[...]" or a callback protocol?
dataprofiler/reports/graphs.py:219: error: Function "matplotlib.pyplot.figure" is not valid as a type [valid-type]
dataprofiler/reports/graphs.py:219: note: Perhaps you need "Callable[...]" or a callback protocol?
dataprofiler/reports/graphs.py:247: error: Return value expected [return-value]
dataprofiler/reports/graphs.py:266: error: Incompatible types in assignment (expression has type "Union[Figure, SubFigure, None]", variable has type "Figure") [assignment]

  • refactored dataprofiler/labelers/regex_model.py module to resolve mypy errors raised after upgrade:

    dataprofiler/labelers/regex_model.py:168: error: Missing return statement [empty-body]
    dataprofiler/labelers/regex_model.py:168: note: If the method is meant to be abstract, use @abc.abstractmethod
    dataprofiler/labelers/column_name_model.py:177: error: Missing return statement [empty-body]

  • refactored dataprofiler/profilers/profiler_options.py module to resolve mypy errors raised after upgrade:

    dataprofiler/profilers/profile_builder.py:1081: error: Dict entry 0 has incompatible type "str": "Optional[BaseDataLabeler]"; expected "str": "bool" [dict-item]
    dataprofiler/profilers/profile_builder.py:1086: error: Dict entry 0 has incompatible type "str": "Optional[BaseDataLabeler]"; expected "str": "bool" [dict-item]

  • Ran pre-commit hooks (mypy, flake8, check-manifest, etc.) to validate changes across the project files.

  • Force-pushed updates after resolving errors to ensure all checks pass.

Note: This PR has not fixed the override mypy errors yet. This can be fixed with another PR.

> dataprofiler/data_readers/parquet_data.py:69: error: Signature of "file_encoding" incompatible with supertype "BaseData"  [override]
> dataprofiler/profilers/numerical_column_stats.py:368: error: Signature of "profile" incompatible with supertype "BaseColumnProfiler"  [override]
> dataprofiler/profilers/float_column_profile.py:197: error: Signature of "profile" incompatible with supertype "NumericStatsMixin"  [override]
> dataprofiler/profilers/text_column_profile.py:87: error: Signature of "profile" incompatible with supertype "NumericStatsMixin"  [override]
> dataprofiler/profilers/int_column_profile.py:95: error: Signature of "profile" incompatible with supertype "NumericStatsMixin"  [override]

4. Updated version.py

  • Incremented the version in the version.py file to reflect the new changes introduced in this PR.

taylorfturner and others added 12 commits September 29, 2023 15:29
* fix codeowners

* fix codeowners

* dash

* lower case
update forked dataprofiler with its recent release
* refactor: Upgrade the models to use keras 3.0 (capitalone#1138)

* Replace snappy with cramjam (capitalone#1091)

* add downloads tile (capitalone#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------

Co-authored-by: Taylor Turner <[email protected]>

* pre-commit fix (capitalone#1122)

* Bug fix for float precision calculation using categorical data with trailing zeros. (capitalone#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (capitalone#1133)

This reverts commit d3159bd.

* refactor: move layers outside of class

* refactor: update model to keras 3.0

* fix: manifest

* fix: bugs in compile and train

* fix: bug in load_from_library

* fix: bugs in CharCNN

* refactor: loading tf model labeler

* fix: bug in data_labeler identification

* fix: update model to use proper softmax layer names

* fix: formatting

* fix: remove unused line

* refactor: drop support for 3.8

* fix: comments

* fix: comment

---------

Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: James Schadt <[email protected]>

* Fix Tox (capitalone#1143)

* tox new

* update

* update

* update

* update

* update

* update

* update

* update tox.ini

* update

* update

* remove docs

* empty retrigger

* update (capitalone#1146)

* bump version

* update 3.11

* remove dist/

---------

Co-authored-by: JGSweets <[email protected]>
Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: James Schadt <[email protected]>
* refactor: Upgrade the models to use keras 3.0 (capitalone#1138)

* Replace snappy with cramjam (capitalone#1091)

* add downloads tile (capitalone#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------



* pre-commit fix (capitalone#1122)

* Bug fix for float precision calculation using categorical data with trailing zeros. (capitalone#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (capitalone#1133)

This reverts commit d3159bd.

* refactor: move layers outside of class

* refactor: update model to keras 3.0

* fix: manifest

* fix: bugs in compile and train

* fix: bug in load_from_library

* fix: bugs in CharCNN

* refactor: loading tf model labeler

* fix: bug in data_labeler identification

* fix: update model to use proper softmax layer names

* fix: formatting

* fix: remove unused line

* refactor: drop support for 3.8

* fix: comments

* fix: comment

---------





* Fix Tox (capitalone#1143)

* tox new

* update

* update

* update

* update

* update

* update

* update

* update tox.ini

* update

* update

* remove docs

* empty retrigger

* update (capitalone#1146)

* Add Python 3.11 to GHA (capitalone#1090)

* add downloads tile (capitalone#1085)

* Add Python 3.11 to GHA

* Replace snappy with cramjam (capitalone#1091)

* add downloads tile (capitalone#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------



* Update dask modules

* Install dask dataframe

* Update dask modules in precommit

* Correct copy/paste error

* Try again to clear Unicode

* Rolled back pre-commit dask version

* Add py311 to tox

* Bump dask to 2024.4.1

* Bump python-snappy 0.7.1

* Rewrite labeler test

* Correct isort

* Satisfy black

* And flake8

* Synced with requirements

---------



* [Vuln Fix]: Resolve mend vulnerabilities related to requests. (capitalone#1162)

* resolved check-manifest issue

* updating keras version pin to <=3.4.0

* adding comment in requirements.txt to trigger mend check

---------



---------

Co-authored-by: JGSweets <[email protected]>
Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: James Schadt <[email protected]>
Co-authored-by: Michael Davis <[email protected]>
@mhmotamedi mhmotamedi requested a review from a team as a code owner January 25, 2025 00:33
Copy link
Contributor

@armaan-dhillon armaan-dhillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left one comment to understand the typing changes

@mhmotamedi mhmotamedi merged commit 1639641 into capitalone:dev Jan 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants