Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite #23446

Merged
merged 8 commits into from
May 25, 2022

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented May 23, 2022

Reference Issues/PRs

Follow-up to #23347
Related to #23197
Specifically addresses #23197 (comment)

What does this implement/fix? Explain your changes.

Removes _safe_accumulator_op from _assert_all_finite since it is not needed in the average case, and can be a significant bottleneck. Even when a false-positive is detected in the rare (and yet-untested) case, the second-pass algorithm will determine it explicitly.

Any other comments?

For profiling info refer to: #23197 (comment)

@Micky774 Micky774 changed the title Removed safe_accumulator_op for first-pass algorithm MAINT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite May 23, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM:

  • on main
In [1]: import numpy as np
   ...: from sklearn.utils.validation import _assert_all_finite
   ...: a = np.random.RandomState(0).randn(int(1e8)).astype(np.float32)
   ...: %timeit _assert_all_finite(a)
54.9 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • on this branch:
In [1]: import numpy as np
   ...: from sklearn.utils.validation import _assert_all_finite
   ...: a = np.random.RandomState(0).randn(int(1e8)).astype(np.float32)
   ...: %timeit _assert_all_finite(a)
28.4 ms ± 39.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sklearn/utils/validation.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member

ogrisel commented May 24, 2022

Not sure if we should document this in the changelog for 1.2. Maybe we should have something like "Reduce the overhead of finiteness checks for float32 input data by leveraging numpy's SIMD optimized primitives." or something similar.

@Micky774
Copy link
Contributor Author

Not sure if we should document this in the changelog for 1.2. Maybe we should have something like "Reduce the overhead of finiteness checks for float32 input data by leveraging numpy's SIMD optimized primitives." or something similar.

I figured this was a small enough change pretty separated from what users really interact with that it would be fine to omit a changelog entry. If you/other reviewers think mentioning the performance gain would be worthwhile I will of course add an entry :)

@thomasjpfan
Copy link
Member

If you/other reviewers think mentioning the performance gain would be worthwhile I will of course add an entry :)

I think it's worth adding a changelog entry for performance improvements.

@Micky774
Copy link
Contributor Author

I think it's worth adding a changelog entry for performance improvements.

Added!

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
@Micky774 Micky774 changed the title MAINT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite May 25, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 38d23c4 into scikit-learn:main May 25, 2022
@Micky774 Micky774 deleted the remove_safe_accumulator branch May 25, 2022 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants