-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor for boundary lat/lon extraction #546
base: main
Are you sure you want to change the base?
Conversation
@mraspaud this PR is ready for review. There are 3 test units failing related the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm against anything that you've done here, but it makes me nervous to change some of these methods that have been around for so long. We'll need to triple check that Satpy or maybe even pyspectral isn't importing/using some of these methods and functions. I do agree that you've made things cleaner and this probably puts us in a good place to refactor this into a separate helper class in the future.
You asked about the geos boundary lonlats methods and yeah I'm not sure if that makes a ton of sense for geostationary bounds, but if there is an option for the user to get the x/y bounds then yeah it might make sense...oh wait, for non-full disk geostationary areas the lon/lats wouldn't be NaNs. You mentioned get_boundary_lonlats
being used in resampling, could you point to where exactly that is?
I don't think the geostatonary bounding box functions need to be made private, but if you feel that it saves us headaches in future refactorings them I'm OK with it (as long as they aren't used in other pytroll packages).
Lastly, I just had an idea perhaps just for future interfaces: what if all of these functions that have _lonlats
and _proj_coords
variants of each other had just a single version of the function that returned a single object and that object had properties (or methods) for getting access to the x/y, lon/lat, or image coordinate versions of some of the information? You'd end up with something like:
my_area.get_boundary(vertices_per_side=20).get_lonlats(chunks=1000)
Or something like that.
I try to answer to all the elements of discussion point by point here below. Geostationary x/y sides
To be safe, I will keep the geostationary utility function as public. Future Boundary Interface
Boundary Sides Consistency
get_boundary_lonlats and SimpleBoundary usage
Conclusion |
I'm not so sure. I think it is one of the failings of our past understandings that all latitude/longitude degrees are the same. That is just not true. There can be prime meridian shifts or different Earth models/datums used. So there is some idea of "native" CRS and a preferred geographic/geodetic CRS. I could easily see a helper object (as I mentioned in my previous comment) that gets you x or y or lon or lat, but allows you to convert that object to another version with a different preferred geodetic CRS.
Is it the I'm sure there are some use cases where we define polygons or boundaries where this idea of needing to define dask arrays is unnecessary, but I'd want to be careful. In a lot of cases I don't think I'd want to generate a polygon and then not immediately use it. However, that polygon may be generated from large lon/lat arrays (this is the worst case scenario I think of all the scenarios) and in that case there may be a benefit to computing the resulting polygon at the same time as all other computations (if possible) to reuse the compute lon/lat arrays (dask would share the result of the tasks). But that being the worst case maybe we just "deal with it". Lon/lat arrays are already the slowest form of computation we use.
I'm not sure I agree. "top" and "bottom" are relative. Is the top of the swath the last scan of the data or the first? Is it the most northern? What if we have a non-polar orbiting satellite? If the sides are always generated from a polygon maybe that will always be more clear (shapely or other library determines what that means)?
I'd have to check how I used this method in my Polar2Grid project. It is likely that I put it in get_bbox_lonlats because that is used internally in the area boundary class (if I recall correctly). But also, other polygon libraries probably need clockwise coordinates.
I don't think it is that people need the counter-clockwise (CCW) version of the coordinates, it is that the polygon/boundary math requires the clockwise (CW) version. Besides wasted performance I don't think I see a reason to never not return CW.
Yep. Could probably return a namedtuple at the very least. This was just old interfaces before anyone knew what they were doing I think. ...Ah I just read your ConclusionI think in general I like where you're going, but I still think we're 1-2 steps away from a design that really "wows" me. I think the idea of something like an |
For this failing test: pyresample/pyresample/test/test_gradient.py Lines 246 to 254 in c0025ed
Can you print out pyresample/pyresample/gradient/__init__.py Lines 223 to 235 in c0025ed
And then the list passed to That coverage_status is filled here: pyresample/pyresample/gradient/__init__.py Lines 164 to 211 in c0025ed
But I'm just using what I get from reading the code. I didn't code this so I could be completely missing something. |
Just to better clarify a couple of thoughts @djhoese :
I don’t refer to the orientation in terms of geographic space, but with respect to the 2D lon/lat arrays. The [ŧop, right,bottom, left] sides of the coordinate arrays.
Yes the
If one want to generate Shapely polygons, it requires CCW order. If the area wraps the pole, we must not modify the ordering because the order determines which pole is wrapped in. |
Oh...so our boundary logic (or spherical geometry stuff) needs CW and shapely uses CCW? Ok so there should be an option. So would/could/should there be a "sides" method on a boundary/polygon or should it be on the area? If there is a "sides" method on the area (and swath) then I think the equivalent of a |
# Conflicts: # pyresample/geometry.py
Ok @ghiggi I've merged this with main which includes my pre-commit changes. There was only one conflicting file on the merge and that was in geometry.py. The biggest conflict was actually just unfortunate where I moved the "edge" method from BaseDefinition to AreaDefinition right under the boundary sides methods and git was confused when it tried to merge them even though they were separate. Otherwise, the area slicing methods/functions are now in Otherwise, my merge just includes a couple things to make pre-commit happy (docstring fixes, indentation/spacing fixes, unused import after it included your changes, etc). Let me know if you have any problems with this or anything looks confusing. I should be online at my usual times tomorrow. |
Very nice job. I re-read your updated PR description and I really like the goal you've set for yourself. It makes a lot of sense and I'm glad someone is trying to tackle the problems and inconsistencies. I think my general overall concern with the changes you've made so far as that I don't like the separation between projections and geographic versions of things. I will admit I may have a misunderstanding of this, but I don't think so. To me, geographic "projections" are not a single thing. They can differ with their model of the Earth and the datum/reference coordinates used for that Earth. As you know we could also shift the prime meridian of a lon/lat CRS and it is still considered geographic. With the above in mind, I think I'd like to look at combining the different boundary classes into a single class. Another thing that leads me to this idea is that you point out that the two classes differ in their I'm also not a fan of the "clockwise" behavior of the boundary classes. Specifically the a. Return a new instance of the boundary with this behavior enforced/applied. It just feels odd to me, but I'm not sure I have any better suggestions at this point. Wild suggestion: Are there enough deprecations and changes here that this should be moved to a single |
@djhoese I also don't like the separation between projections and geographic versions of things, but it seems to me required when dealing with boundaries. Here we are not speaking of geographical or projected CRS , but the presence of two types of coordinates: spherical and planar ! Maybe we should call the classes For I try to elaborate here below some additional thoughts:
Side note: there was already a method |
Ah the Spherical versus Planar makes this make more sense now.
I don't understand why the type of coordinate retrieved (projection or geographic) would effect clockwise or counterclockwise. Don't we have enough cases of swaths (ascending and descending orbits or other old instrument types) and areas (north-up areas, south-up like FCI, and south-up/east-left like SEVIRI) where any coordinate combination could give you any orientation/ordering? Besides the cases of what data we started with, what are cases where we need to use spherical boundaries even if we have a projected/planar area? Also, are there cases where we have geographic boundaries but can assume they don't cross the anti-meridian and could therefore be treated as planar coordinates? For example, a mercator or eqc projected dataset is not technically valid if it crosses the anti-meridian (it goes outside the coordinate space of the projection) so there is no need to treat it as a sphere. Similarly, if it does cross the anti-meridian in lon/lat space, but we shift the prime meridian we could (in some cases that don't cross the poles) treat the lon/lat coordinates as planar too. I think I've realized one of the main reasons I don't like set_clockwise is that it puts state into the object and makes it mutable. It really feels like these objects should be immutable and only be containers for the boundary information and not modifiers of it. I realize that the clockwise wish property is not modifying the data until it is returned, but it causes confusion when using it. I would almost feel better if the boundary object knew the order/orientation of the data (either told by the user as a kwarg or determined by the class when needed) and besides automatically converting for things like shapely object creation, it could probably methods for forcing it that return a new copy of the boundary with the modifications done already. If the orientation/order is the same as the Boundary already then return This also makes me think, could the pyresample spherical stuff be made to detect the closed/not-closed state of the polygon provided to it and adjust as needed? That way there doesn't need to be special handling...eh I guess this is a nice convenience for any user to have that ability to get closed or open boundary coordinates. |
…endency injections
Thansk for all these discussions @djhoese. They also help me to completely free out my thoughts.
All
Yes many cases. With this PR, we will be in a good place to perform data reduction, on the sphere or with shapely, based on the type of source and target area.
And we need to stay on the sphere if the two areas CRS bounds does not intersect !
I agree with that and I strongly push to deprecate the existing use of
That's just requires to check if the first vertex is equal to the last |
…l planar projections, ...
We mentioned this on slack, but if backwards compatibility is limiting this design at all then I am very in favor of setting up as a from .future.boundary import Boundary as FutureBoundary
class _OldBoundary:
...
class LegacyBoundary(FutureBoundary, _OldBoundary):
... That is, the boundary used by the existing
For some reason I was sure that the spherical stuff was doing stuff in XYZ geocentric coordinate space, but you're right it is all lon/lat. Interestingly the pyresample/pyresample/spherical.py Lines 547 to 552 in 25d0c72
Lastly, about inheritance versus composition/dependency injection: These classes seem very similar and almost not different at all. Unless you have a strong argument against it, it really seems better to require a class Boundary:
def __init__(self, ...):
if self.crs.is_geographic:
self._polygon_helper = GeographicPolygonHelper(self.vertices, self.crs)
else:
self._polygon_helper = PlanarPolygonHelper(self.vertices, self.crs)
def to_shapely_polygon():
return self._polygon_helper.to_shapely_polygon() Now of course I haven't done all of the necessary checks about how the code differs, but I wanted to give an example of what I was thinking/talking about when mentioning these topics. I'd also like to rethink this idea of passing the type of polygon class to generate with a string keyword argument. There are a lot of design patterns that exist that could help us with this, but at the moment I don't have time to dive into them. This could easily be one of the last decisions made about this PR so not something we need to do right now. |
@@ -250,7 +251,8 @@ def get_lonlats(self, data_slice=None, chunks=None, **kwargs): | |||
# lons/lats are xarray DataArray objects, use numpy/dask array underneath | |||
lons = lons.data | |||
lats = lats.data | |||
|
|||
# TODO | |||
# --> if data_slice and chunks provided, why here first rechunk all array and then subset? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here we can increase a bit performance ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data_slice
case is not really utilized in Satpy so the chunking was implemented separate from that. The idea is/was that the chunking provided by the user is to match some other array (the band data from a Satpy reader for example). I suppose the chunking could be done after...but I'm scared of edge cases and unexpected consequences. If you know of a chunks + data_slice case coming from Satpy please let me know and point it out.
lon_b = np.concatenate((lons.side1, lons.side2, lons.side3, lons.side4)) | ||
lat_b = np.concatenate((lats.side1, lats.side2, lats.side3, lats.side4)) | ||
|
||
vertices_per_side = 3600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exagerated and really slow down stuffs! Why has been set so large ?
If it was used to avoid cutting out portions of GEO area ... now I fixed the underlying bug !!!
"""Create an AreaDefinition object for testing.""" | ||
area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs) | ||
return area | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that pytest
fixture has a bug with nested call when passed to pytest
mark parametrize.
Instead of using the fixture defined in conftest
, this enable to test stuffs. Maybe @djhoese you want to check it out ... I spend 3-4 hours ... and did find a way to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures
. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.
The PR is now ready for review @djhoese @mraspaud. I again updated the PR description with the last changes.
I now wait for your feedbacks before implementing the last test units. I also plan to still add a commit to enable to pass a tuple of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for putting in all this work. However, this is WAY too big of a pull request. The last I saw of this was extracting some boundary logic and reworking some new boundary classes. Now you've added a visualization
subpackage and deprecated everything boundary or "sides" related? Very very scary and a lot of work for one PR.
I made a lot of comments to hopefully guide our discussion of this in the future. I skipped over all the tests except your one question about fixtures in test_area.py. I also ignored a lot of the boundary sides changes in geometry.py as it looked very complex and not necessarily complete or completely cleaned up.
correct ordering of vertices
What does "correct" mean? If the word "correct" is mentioned in docstrings anywhere it should be explained.
I also plan to still add a commit to enable to pass a tuple of vertices_per_side.
I wouldn't worry about it unless it is needed in this PR. Make an issue as a reminder if necessary.
We could use getattr to handle deprecations: https://peps.python.org/pep-0562/
I've mentioned it before and in multiple comments, but the more and more you change the more obvious it is that this work should be moved to the future geometry classes. At the same time, while I'm sure Martin and I are not giving you the rapid type of feedback you'd like, I'd really like more discussion before you spend hours or days working on a complete rewrite of some of this stuff. For example the test_area.py
fixture issue, you mentioned you spent 3-4 hours on that. If it was the problem I mention in my comment that should have been a quick question and quick answer on slack. Maybe Martin and I don't have a good enough grasp on your vision for this stuff or on the existing boundary logic to give you the discussion you need or expect, but a PR this size (the second or third of yours like this) needs to consist of more discussion and less large scale rewrites and deprecations.
Some of your more recent changes seemed to stem from trying to get the tests to pass, but besides your own TDD tests that may be leading your development, I'm not sure this is needed until we've all agreed on the design/interfaces/layout. Otherwise you risk wasting your time rewriting these tests multiple times.
My overall request for changes on this are:
- Move changes to future geometry classes where every possible. Utilize
AreaDefinition.__getattr__
if needed to prevent access to the deprecated methods. In a future PR of mine I will remove the "LegacyAreaDefinition" subclass of the future AreaDefinition and only access legacy methods through__getattr__
and warnings/errors. - Combine some of this logic if at all possible. As mentioned elsewhere, the Boundary classes seem very close to be a single class with smart tricks for handling the special cases or delegating to pyproj's CRS classes for the information they seek. Similarly, I noticed some try/excepts or if/else in various places in this PR that "feel" like they should be an internal decision at a lower-level of the code or a decision at the higher level...some sort of abstraction. I see
None
used in a couple places where a more concise and intentional solution seems possible (or at least better defined and documented).
__all__ = [ | ||
'grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE', | ||
'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml', | ||
"_root_path", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this listed in __all__
. The all variable is meant to deal with from pyresample import *
. No need for _root_path
to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I didn't know the purpose of __all__
was for that :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this topic, from pyresample import *
is not recommended in python this a while, should we just remove __all__
?
@@ -56,8 +56,13 @@ | |||
|
|||
from .version import get_versions # noqa | |||
|
|||
__all__ = ['grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE', | |||
'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml'] | |||
_root_path = os.path.dirname(os.path.realpath(__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pathlib import Path
pkg_root_path = Path(__file__).resolve().parent
And is this used outside of the test modules? If not, I'd prefer this go in pyresample/test/__init__.py
or maybe in conftest.py but I think it'd have to be a fixture to be useful there and that probably isn't worth it. You'd have to do the right amount of .parent.parent
to get the correct directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -40,6 +40,7 @@ | |||
"features": { | |||
"future_geometries": False, | |||
}, | |||
"force_boundary_computations": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this it will need documentation, but that can be the last thing in this PR.
except Exception: | ||
valid_indices = np.ones(source_lons.size, dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this was done to make tests pass? What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No @djhoese. This try-except is used to deal with out of Earth disk projections !!!
Before this PR, the boundary was created using get_boundary_lonlats
(see
pyresample/pyresample/geometry.py
Line 276 in 294e6ea
def get_boundary_lonlats(self): |
So for the geostationary full disk and other projections with out-of-earth disk at the border, the boundary sides were all
Inf
. And inside the .get_valid_index_from_lonlat_boundaries
if there was any invalid side coordinate, it was returning an array of ones (no data reduction done !). This means that for full disc GEO projections we were not doing data reduction !!!
In this PR, I replaced get_boundary_lonlats
with target_geo_def.boundary().sides
. So:
- if the area is geostationary, now we finally reduce data (and this should speed up stuffs !!!)
- if the area boundary has out-of-Earth locations and the flag
force_boundary_computations
isTrue
, the boundary method will retrieve the actual boundary coordinate inside the area, and we will reduce data ... - if the area boundary has out-of-Earth locations and the flag
force_boundary_computations
isFalse
(to keep the old behaviour), the boundary method now fails because if all sides areInf
it will raise aValueError
within_filter_bbox_nans
. If this happen, we return what we returned before: thenp.ones
array
warnings.warn("'get_boundary_lonlats' is deprecated. Please use " | ||
"'area.boundary().sides'.", DeprecationWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of deprecation and all others like it seem like something that should be done in a future
version of the geometry. This is perhaps too large of a change to request users make while still calling it a 1.x release...or is this method relatively new?
coordinates="geographic") | ||
# Polar Projections, Global Planar Projections (Mollweide, Robinson) | ||
# - Retrieve dummy right and left sides | ||
if config.get("force_boundary_computations", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? What is being forced? How does this result differ from when it isn't forced/computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config regulates the behaviour when an AreaDefinition has boundaries on out-of-Earth regions.
force_boundary_computations == False
keep the old behaviour of not searching for the actual internal boundary cooordinates, but just select the coordinates at the image border.
For global projections, (i.e. Mollweide, Robinson, ...), polar projections, ... :
- if all image sides are out of Earth, the sides would only have
Inf
values and aValueError
is raised further downstream if all values areInf
. - If some of the image sides are out of Earth, some sides values would be
Inf
, suchInf
value are discarded further downstream in the processing. If not all values are Inf, a "partial boundary" is returned, however it does not represent the true Earth boundary of the area.
With force_boundary_computations == True
, we actually search the actual internal coordinates of the boundary.
To do so, we need to compute in memory one the coordinates array , from here the name force_*_computations
.
See my other comment answering to the reason of the try-except logic to understand the rationale of this choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we'll keep this discussion on this thread instead of the 3 other comments I made related to the same topic.
Is another way of seeing this as "produce the best possible boundary possible or do the quick/shortcut method"? Where the quick/shortcut is the old way, right? Sorry if I'm getting confused, but this only applies to generating the spherical/lonlat boundaries right? The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right? Should we include swaths in this current discussion or wait to confuse me later?
Can we use information from the PROJ/pyproj CRS class (like axis bounds) if they exist to do an even faster method of this where maybe we clip the lon/lats to the defined limits of the CRS? They don't always exist...hhmm but the result is always Inf isn't it? Not -Inf
and Inf
...kind of hard to clip then. I could also see us hardcoding some specializations of this for specific projections like we do for geostationary. That would make things a little smarter and faster wouldn't it?
Perhaps a better name for this would be something related to doing the "extra" work to get the best answer rather than phrasing it as "forcing" something. If we can speed up the common case maybe this could default to "on"/True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include swaths in this current discussion
SwathDefinition
are not impacted by force_boundary_computations
flag.
force_boundary_computations
only impacts AreaDefinition where Earth coordinates only lies inside an internal region of the coordinates array (i.e. geostationary ... but for geostationary we have a smart ad-hoc trick to retrieve the actual internal Earth boundary coordinates without loading the coordinates array into memory).
I looked deeply into PROJ/pyproj CRS and I didn't find another way to determine the actual valid internal coordinate boundary without the trick I implemented in the PR.
Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.
This only applies only applies to generating the spherical/lonlat boundaries right?
force_boundary_computations=True
is required to perform correct spherical polygon operationsforce_boundary_computations=True
will likely fix issues inget_area_slices
(because of previous bad polygon intersections)force_boundary_computations=True
speeds up some pykdtree operations whenreduce_data=True
. Currently resampling geostationary full disc (i.e. with nearest) usingreduce_data=True
does not reduce the data (because the boundary returned is allInf
...)
The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right ?
force_boundary_computations=True
is also used in thearea.projection_boundary
method and impacts the gradient resampling, where data reduction is (assumed to be done) in planar projection coordinates.area.projection_boundary
return the x/y coordinates corresponding to Earth valid coordinates, and so enable the transform of i.e. the actual internal GEO boundary to another CRS -->reduce_data=True
with gradient was failing for GEO FD and this PR we might have solved the issue (not tested ... I didn't want to also refactor gradient logic :P)
If we can speed up the common case maybe this could default to "on"/True
I think that the boundary
and projection_boundary
methods should return the correct boundaries.
This will also benefit get_area_slices
in some cases (remapping to polar projections was causing issues ...)
Then the decision of what should be done in pykdtree
and gradient
it's a choice to be based on the following points:
- Is it fine for some special
AreaDefinition
projections to load the entire coordinates in memory? We go a bit away from the idea of only load the required data with dask ... but on the other side in such special cases ... we were later on loading all coordinates anyway to discover which was valid and which not ... - In pykdtree, we will just gain in performance and I don't expect any surprise
- In gradient, there is somewhat lot of assumptions in how data reduction is done and it might be there will be some surprise with some projections, but on the other hand we move forward to fix several issues ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.
Yeah...I figured. Shoot.
force_boundary_computations=True speeds up some pykdtree operations when reduce_data=True. Currently resampling geostationary full disc (i.e. with nearest) using reduce_data=True does not reduce the data (because the boundary returned is all Inf ...)
With reduce_data=True, this is still get_area_slices
functionality, right (so the same as your previous bullet point)? I thought we had the custom geostationary boundary logic so what about it has Inf coordinates? Or are the x/y of the boundary resulting in Inf when converted to lon/lats?
Then the decision of what should be done in pykdtree and gradient it's a choice to be based on the following points:
Hhhmm this is difficult. I haven't looked at your full implementation but I'm wondering if in the dask-case (and numpy I guess), for the default, we could be smart about choosing a limited set of chunks to compute completely. Maybe it wouldn't be as perfect as the full lookup but could be really close.
ll_x, ll_y, ur_x, ur_y = area_extent | ||
bottom = [(x, ll_y) for x in np.linspace(ll_x, ur_x, nb_points + 2)] | ||
right = [(ur_x, y) for y in np.linspace(ll_y, ur_y, nb_points + 2)][1:] | ||
top = [(x, ur_y) for x in np.linspace(ur_x, ll_x, nb_points + 2)][1:] | ||
left = [(ll_x, y) for y in np.linspace(ur_y, ll_y, nb_points + 2)][1:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something numpy can do in a vectorized way (numpy array operations only)...not the most important part of this PR by far.
# - If invalid sides, return np.ones | ||
try: | ||
sides_lons, sides_lats = target_geo_def.boundary().sides | ||
# Combine reduced and legal values | ||
valid_input_index &= \ | ||
data_reduce.get_valid_index_from_lonlat_boundaries( | ||
sides_lons, | ||
sides_lats, | ||
source_lons, source_lats, | ||
radius_of_influence) | ||
except Exception: | ||
valid_input_index = np.ones(source_lons.size, dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this work before? This feels like this should happen somewhere else and the except Exception
needs to be figured out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment for the other try-except ;)
"""Create an AreaDefinition object for testing.""" | ||
area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs) | ||
return area | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures
. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extensive work!
Unfortunately, I think this PR is too big as it is and need to be split into multiple PR, if just for the reviewers sanity. My suggestion would we to have:
- One PR for TEST_FILES_PATH
- One PR for replacing frequency with vertices_per_side
- One PR for refactoring the visualisation part
- One PR for renaming AreaDefBoundary and SwathDefBoundary to PlanarBoundary and SphericalBoundary (and adding corresponding deprecations)
- One PR for fixing the gradient resampler
- One PR for fixiing the kd_tree resampler
- One PR for refactoring the boundary code and adding the new functionality
- One PR for fixing the data reducing
- etc...
Also, a lot of the new code does not seem to be tested. This makes it very difficult to understand the use cases. Tests are not only for testing, but also for illustrating the use cases of the code, and that helps me as reviewer understand the code and the intention.
I cannot accept this PR for now, as it is not possible for me to understand all cases this covers and all the implication within a reasonable amount of time.
I will however gladly review smaller, incremental PRs to introduce all the work you have done here.
expected_lons = [ | ||
-90.67900085, 79.11000061, # 81.26400757, | ||
81.26400757, 29.67200089, # 10.26000023, | ||
10.26000023, -5.10700035, # -21.52500153, | ||
-21.52500153, -21.56500053, # -90.67900085, | ||
] | ||
expected_lats = [ | ||
85.23900604, 80.84000397, # 67.07600403, | ||
67.07600403, 54.14700317, # 30.54700089, | ||
30.54700089, 34.0850029, # 35.58000183, | ||
35.58000183, 62.25600433, # 85.23900604, | ||
] | ||
np.testing.assert_allclose(lons, expected_lons) | ||
np.testing.assert_allclose(lats, expected_lats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are one third of the values commented out?
expected_lons = [ | ||
-45., -90., # -135., | ||
-135., -180., # 135., | ||
135., 90., # 45., | ||
45., 0., # -45. | ||
] | ||
expected_lats = [ | ||
80., 80., # 80., | ||
80., 80., # 80., | ||
80., 80., # 80., | ||
80., 80., # 80. | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the comment values for?
# Swath defs raise AttributeError, and False is returned | ||
get_polygon.side_effect = AttributeError | ||
self.resampler._get_dst_poly('idx2', 0, 10, 0, 10) | ||
assert self.resampler.dst_polys['idx2'] is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this part of the test been removed? do we loose functionality with this PR?
"""Test polygon creation.""" | ||
from pyresample.gradient import get_polygon | ||
from pyresample.gradient import _get_polygon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a private (_) function if it needs to be imported?
TEST_FILES_PATH = os.path.join(_root_path, "test", 'test_files') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also defined in at least one other file, it should be factorised.
|
||
class BaseBoundary: | ||
"""Base class for boundary objects.""" | ||
__slots__ = ["_sides_x", "_sides_y"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you motivate the use of __slots__
here? I don't see any real benefit in this class.
def _compute_boundary_sides(self, area, vertices_per_side): | ||
"""Compute boundary sides.""" | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the base class should be abstract, right?
return not polygon.exterior.is_ccw | ||
|
||
|
||
class PlanarBoundary(BaseBoundary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not sufficiently tested.
@classmethod | ||
def _compute_boundary_sides(cls, area, vertices_per_side): | ||
sides_x, sides_y = area._get_projection_sides(vertices_per_side=vertices_per_side) | ||
return sides_x, sides_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a static method.
def __init__(self, area, vertices_per_side=None): | ||
super().__init__(area=area, vertices_per_side=vertices_per_side) | ||
|
||
self.sides_x = self._sides_x | ||
self.sides_y = self._sides_y | ||
self.crs = self._area.crs | ||
self.cartopy_crs = self._area.to_cartopy_crs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be at the top of the class
This PR introduces a consistent interface for dealing with area boundaries in pyresample.
It simplify the codebase and will enable to significantly simplify the code of
geometry.py
towards pyresample 2.0The PR arises from my understanding that users and developers often needs:
np.array([x,y])
ornp.array([lon, lat])
(x, y)
, or(lon,lat)
x
,y
,lon
,lat
shapely Polygon
creationpyresample spherical
operationspyresample spherical
operationsshapely
operationsThe user need to be able to retrieve such coordinates in the geographic space (available for all areas) or, for specific cases (
AreaDefinition
), in the projection coordinates. A simple and easy to use interface for that is currently missing.Currently we had methods which returned tuple
(x,y)
orsides
with unconsistent behaviour regarding:With this PR, I aim to make all boundary related operations consistent across a unique interface.
To this end I introduced to classes:
SphericalBoundary
andPlanarBoundary
:SwathDefinition
objects only have theSphericalBoundary
classAreaDefinition
objects:SphericalBoundary
classPlanarBoundary
andSphericalBoundary
To retrieve
AreaDefinition
boundary projection coordinates, I added thearea.projection_boundary()
. This method returns a SphericalBoundary if the CRS is geographic, otherwise aPlanarBoundary
class.To retrieve the
SphericalBoundary
,SwathDefinition
andAreaDefinition
share thearea.boundary()
method.To deal with boundary geographical coordinates, 4 different boundary classes existed:
AreaBoundary, AreaDefBoundary, SimpleBoundary, Boundary
.In the PR, I introduced the
SphericalBoundary
which inherits methods fromAreaBoundary
(to guarantee backward compatibility), and I deprecated the use of the other existing boundary classes.
The
SphericalBoundary
can be retrieved usingarea.boundary()
.Here below I document the deprecations and the proposed replacement:
For the
SphericalBoundary
andPlanarBoundary
there are the following common methods:set_clockwise()
set_counterclockwise()
sides
(sides_x, and sides_y)
or(sides_lons, sides_lats)
tuple depending on the classsides_*
is an object of the new classBoundarySides
sides_*
has the properties:sides.top, sides.bottom, sides.left, sides.right, sides.vertices
vertices
np.array([x,y])
ornp.array([lon, lat])
contour(closed=False)
(x, y)
or(lon, lat)
concatenated boundary sides tupleclosed=True
, closes the vertices so that can be passed toshapely Polygon
directlyclosed=False
, the vertices are not closed, and they can be passed directly topyresample SPolygon
contour(closed=False)
to_shapely()
plot()
The
SphericalBoundary
has the unique properties:boundary.x, boundary.y
(returns the concatenated sides)boundary.sides_x, boundary.sides_y
(returns the single coordinate sides)polygon
The
PlanarBoundary
has the unique properties:boundary.lons, boundary.lats
(returns the concatenated sides)boundary.sides_lons, boundary.sides_lats
(returns the single coordinate sides)This PR fixes several long-standing issues:
SwathDefinition
andAreaDefinition
and therefore fix the downstream spherical operations performed inget_area_slices
AreaDefinition
whose corner lies outside of the Earth disk (geostationary, polar projections, global projections others than PlateeCaree), it now enable to retrieve the correct Earth internal boundary (in spherical and planar coordinates), thus allowing data reduction inpykdtree
andgradient
resampling !AreaDefinition
:(top, right, bottom, left)
that was previously messed up when doingforce_clockwise=True
inget_bbox_lonlats
This PR introduces the following backward-incompatibilities:
vertices_per_side
, are now different compared to beforepykdtree
andgradient
resampling now does data reduction forAreaDefinition
whose corner lies outside of the Earth diskboundary
method now returns aSphericalBoundary
object instead ofAreaBoundary
object. TheSphericalBoundary
deprecates thecountour_poly
method in favour of thepolygon
attribute and deprecates the use of thedraw
method in favour of aplot
method usingcartopy
. Moreover, the boundary sides object of a coordinate (sides_lons
andsides_lons
) are now returned as aBoundarySides
objects instead of a list of 4 ordered arrays[top, right, left, bottom]
. However theBoundarySides
has an__iter__
method and can be treated as the previous list except that it does not allow to modify/assign list element values.Ongoing bug in
get_geostationary_bounding_box_in_proj_coords(nb_points)
nb_points
defined as argument