Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newer versions of sklearn require better data validation and extra estimator tags #77

Open
mvdoc opened this issue Mar 10, 2025 · 0 comments · May be fixed by #78
Open

Newer versions of sklearn require better data validation and extra estimator tags #77

mvdoc opened this issue Mar 10, 2025 · 0 comments · May be fixed by #78
Assignees

Comments

@mvdoc
Copy link
Collaborator

mvdoc commented Mar 10, 2025

I implemented a similar fix in the voxelwise tutorials (gallantlab/voxelwise_tutorials#34). Here we probably want to make it backward compatible to allow versions of sklearn < 1.6

FAILED himalaya/ridge/tests/test_sklearn_api_ridge.py::test_check_estimator[torch-GroupRidgeCV_()-check_estimator_tags_renamed] - TypeError: Estimator GroupRidgeCV_ has defined either `_more_tags` or `_get_tags`, but not `__sklearn_tags__`. If you're customizing tags, and need to support multiple scikit-learn versions, you can implement both `__sklearn_tags__` and `_more_tags` or `_get_tags`. This change was introduced in scikit-learn=1.6
FAILED himalaya/ridge/tests/test_sklearn_api_ridge.py::test_check_estimator[torch-GroupRidgeCV_()-check_n_features_in_after_fitting] - AssertionError: `GroupRidgeCV_.predict()` does not check for consistency between input number
of features with GroupRidgeCV_.fit(), via the `n_features_in_` attribute.
You might want to use `sklearn.utils.validation.validate_data` instead
of `check_array` in `GroupRidgeCV_.fit()` and GroupRidgeCV_.predict()`. This can be done
like the following:
from sklearn.utils.validation import validate_data
...
class MyEstimator(BaseEstimator):
    ...
    def fit(self, X, y):
        X, y = validate_data(self, X, y, ...)
        ...
        return self
    ...
    def predict(self, X):
        X = validate_data(self, X, ..., reset=False)
        ...
    return X
= 56 failed, 1434 passed, 1612 skipped, 6089 warnings, 112 rerun in 105.74s (0:01:45) =
@mvdoc mvdoc self-assigned this Mar 10, 2025
@mvdoc mvdoc linked a pull request Mar 10, 2025 that will close this issue
3 tasks
@mvdoc mvdoc changed the title Newer versions of sklearn require better data validation Newer versions of sklearn require better data validation and extra estimator tags Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant