Skip to content

Get test_aoi_and_aoi_projection passing on numpy 1.22.0 #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 10, 2022

Conversation

kandersolar
Copy link
Member

@kandersolar kandersolar commented Jan 6, 2022

  • [ ] Closes #xxxx
  • I am familiar with the contributing guidelines
  • [ ] Tests added
  • [ ] Updates entries to docs/sphinx/source/api.rst for API changes.
  • [ ] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • [ ] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

A test recently started failing because a returned value was very slightly different from the expected value (example):

>       assert_allclose(aoi, aoi_expected, atol=1e-6)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=1e-06
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 1.20741827e-06
E       Max relative difference: inf
E        x: array(1.207418e-06)
E        y: array(0)

In here #717 (comment) I speculated that this was because of a change in numpy 1.22.0 enabling a set of SIMD extensions (AVX512), which can improve calculation speed but can also result in very slightly different calculation results. Here are the reasons I think it's the source of this test failure:

  • Numpy currently only enabled AVX512 on linux, and only in 1.22.0. This is consistent with our test failure, as only the bare linux python 3.8 and 3.9 tests failed. There are no numpy 1.22.0 wheels for python 3.7 and below so they used an older numpy that doesn't have AVX512 enabled. Additionally, the conda linux tests are still using an older numpy, so the extension was not enabled for them either.
  • I couldn't reproduce the test failure locally on a CPU that doesn't support AVX512, and I could reproduce it on a CPU that does support AVX512.
  • Setting this magic environment variable resolved the test failure on the CPU with AVX512 enabled: NPY_DISABLE_CPU_FEATURES="AVX512F,AVX512CD,AVX512VL,AVX512BW,AVX512DQ,AVX512_SKX"

So I'm trying that same environment variable in our CI configuration to see if that resolves the CI failure as well. If so I think we can be pretty confident that this numpy change is responsible.

As a permanent fix, I think we should just loosen the tolerance on the failing test a bit.

@kandersolar kandersolar added this to the 0.9.1 milestone Jan 6, 2022
@kandersolar kandersolar changed the title Get test_aoi_and_aoi_projection Get test_aoi_and_aoi_projection passing on numpy 1.22.0 Jan 6, 2022
@kandersolar
Copy link
Member Author

all pytests passed using that environment variable, so I'll switch to loosening the test tolerance.

@wholmgren
Copy link
Member

Wow, nice work. Do you see a change in https://pvlib-benchmarker.github.io/pvlib-benchmarks/#irradiance.Irradiance.time_aoi_projection ? That test might not use enough data points to get over the python overhead. Just noticed that 5bdad644 slowed it down by more than I would have expected.

@cwhanse
Copy link
Member

cwhanse commented Jan 6, 2022

I think we should just loosen the tolerance on the failing test a bit.

I concur. The case that is failing is zenith=tilt=30, and solar_azimuth=system_azimuth=180, which should return AOI=0 but returns AOI ~ 1.2e-6. Close enough to zero for any use case I can think of.

@kandersolar
Copy link
Member Author

It can apparently be quite a significant speed improvement for numpy-heavy functions. Here is a small comparison for irradiance.aoi, done manually on the pvlib-benchmarker server:

Click to expand
import numpy as np
import pvlib
import timeit

data = []

for N in np.logspace(2, 5, 30).astype(int):
    surface_tilt = np.random.random(N)
    surface_azimuth = np.random.random(N)
    solar_zenith = np.random.random(N)
    solar_azimuth = np.random.random(N)

    dt = timeit.timeit(lambda: pvlib.irradiance.aoi(surface_tilt, surface_azimuth, solar_zenith, solar_azimuth), number=5000)
    data.append({'N': N, 'elapsed': dt})

print(data)

image

As for the ASV results -- I keep managing to fix a problem on the server without actually fixing it, so the current published timings are a bit out of date. I think none of them are new enough to have used numpy 1.22.0 so I wouldn't expect to see a speed-up in those plots yet.

@kandersolar kandersolar marked this pull request as ready for review January 7, 2022 18:40
@@ -878,7 +878,7 @@ def test_aoi_and_aoi_projection(surface_tilt, surface_azimuth, solar_zenith,
aoi_proj_expected):
aoi = irradiance.aoi(surface_tilt, surface_azimuth, solar_zenith,
solar_azimuth)
assert_allclose(aoi, aoi_expected, atol=1e-6)
assert_allclose(aoi, aoi_expected, atol=1e-5)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse any objection to just bumping this tolerance to 1e-5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection

@kandersolar
Copy link
Member Author

Ok, proposed fix seems uncontroversial so I'll go ahead and merge. It's nice that such a sweeping change (numpy returning slightly different answers) ended up breaking only one test.

Also I had another thought: the ASV runs are done in conda envs, which means we might not see the new numpy there for some time. No clue what the update cadence is on conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants