Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor for boundary lat/lon extraction #546

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

ghiggi
Copy link
Contributor

@ghiggi ghiggi commented Oct 10, 2023

This PR introduces a consistent interface for dealing with area boundaries in pyresample.
It simplify the codebase and will enable to significantly simplify the code of geometry.py towards pyresample 2.0

The PR arises from my understanding that users and developers often needs:

  • the boundary sides (top, right, bottom, left) of the coordinate(s) array(s)
  • the concatenated boundary sides (without duplicates at the corners) in the form of
    • as 2D vertices array: np.array([x,y]) or np.array([lon, lat])
    • as a tuple : (x, y), or (lon,lat)
    • only a specific geographic or projection coordinate: x , y , lon, lat
    • with the last vertex equal to the first for i.e. correct shapely Polygon creation
    • with the last vertex not equal to the first for i.e correct pyresample spherical operations
    • with the order of the vertices to be clockwise for i.e. correct pyresample spherical operations
    • with the order of the vertices to be counterclockwise for i.e correct shapely operations
    • and being able to change the vertices order dynamically

The user need to be able to retrieve such coordinates in the geographic space (available for all areas) or, for specific cases (AreaDefinition), in the projection coordinates. A simple and easy to use interface for that is currently missing.

Currently we had methods which returned tuple (x,y) or sides with unconsistent behaviour regarding:

  • function naming
  • last vertex behaviour,
  • duplicated consecutive coordinates vertices,
  • (correct !!!) vertices order,
  • ...

With this PR, I aim to make all boundary related operations consistent across a unique interface.
To this end I introduced to classes: SphericalBoundary and PlanarBoundary:

  • SwathDefinition objects only have the SphericalBoundary class
  • AreaDefinition objects:
    • if the CRS is geographic, they have only the SphericalBoundary class
    • if the CRS is a planar projection, they have a PlanarBoundary and SphericalBoundary

To retrieve AreaDefinition boundary projection coordinates, I added the area.projection_boundary(). This method returns a SphericalBoundary if the CRS is geographic, otherwise a PlanarBoundary class.
To retrieve the SphericalBoundary, SwathDefinition and AreaDefinition share the area.boundary() method.

To deal with boundary geographical coordinates, 4 different boundary classes existed: AreaBoundary, AreaDefBoundary, SimpleBoundary, Boundary.
In the PR, I introduced the SphericalBoundary which inherits methods from AreaBoundary
(to guarantee backward compatibility), and I deprecated the use of the other existing boundary classes.
The SphericalBoundary can be retrieved using area.boundary().

Here below I document the deprecations and the proposed replacement:

area.get_boundary_lonlats() --> DEPRECATED 
area.get_edge_lonlats()  -->  area.boundary().contour() 
get_geostationary_bounding_box_in_lonlats(area) -->  area.boundary().contour() 

area.get_edge_bbox_in_projection_coordinates() --> self.projection_boundary().contour()
get_geostationary_bounding_box_in_proj_coords(area) --> area.projection_boundary().contour() 

area.get_bbox_lonlats() -->  area.boundary().sides
area._get_geo_boundary_sides() --> area.boundary().sides

<previously missing> -->  area.projection_boundary().sides

For the SphericalBoundary and PlanarBoundary there are the following common methods:

  • set_clockwise()
  • set_counterclockwise()
  • sides
    • Return the (sides_x, and sides_y) or (sides_lons, sides_lats) tuple depending on the class
    • Each of the sides_* is an object of the new class BoundarySides
    • Each of the sides_* has the properties: sides.top, sides.bottom, sides.left, sides.right, sides.vertices
  • vertices
    • Return the concatenated boundary sides 2D vertices array.
    • Depending on the class it will be np.array([x,y]) or np.array([lon, lat])
  • contour(closed=False)
    • return the (x, y) or (lon, lat) concatenated boundary sides tuple
    • if closed=True, closes the vertices so that can be passed to shapely Polygon directly
  • if closed=False, the vertices are not closed, and they can be passed directly to pyresample SPolygon
  • contour(closed=False)
  • to_shapely()
    • returns a shapely polygon with the correct vertices orientation
  • plot()
    • Plot the area in the geographic or projection space respectively

The SphericalBoundary has the unique properties:

  • boundary.x, boundary.y (returns the concatenated sides)
  • boundary.sides_x, boundary.sides_y (returns the single coordinate sides)
  • polygon

The PlanarBoundary has the unique properties:

  • boundary.lons, boundary.lats (returns the concatenated sides)
  • boundary.sides_lons, boundary.sides_lats (returns the single coordinate sides)

This PR fixes several long-standing issues:

  • It now ensure the correct vertices order from SwathDefinition and AreaDefinition and therefore fix the downstream spherical operations performed in get_area_slices
  • For AreaDefinition whose corner lies outside of the Earth disk (geostationary, polar projections, global projections others than PlateeCaree), it now enable to retrieve the correct Earth internal boundary (in spherical and planar coordinates), thus allowing data reduction in pykdtree and gradient resampling !
  • For geostationary AreaDefinition:
    • if the area is entirely within the Earth disk, it now use the classical approach of selecting the coordinate array sides
    • if some of the coordinate arrays coordinates are out of Earth disk, it uses the previously existing ad-hoc approach and
      • it now returns a correctly boundary for partial sides out of Earth disk (like CONUS)
      • it now returns the correct ordering of such geostationary partial areas.
  • It ensure a fixed ordering of boundary sides (top, right, bottom, left) that was previously messed up when doing force_clockwise=True in get_bbox_lonlats

This PR introduces the following backward-incompatibilities:

  • The number of vertices of geostationary area, for a given vertices_per_side, are now different compared to before
  • The pykdtree and gradient resampling now does data reduction for AreaDefinition whose corner lies outside of the Earth disk
  • The boundary method now returns a SphericalBoundary object instead of AreaBoundary object. The SphericalBoundary deprecates the countour_poly method in favour of the polygon attribute and deprecates the use of the draw method in favour of a plot method using cartopy. Moreover, the boundary sides object of a coordinate (sides_lons and sides_lons) are now returned as a BoundarySides objects instead of a list of 4 ordered arrays [top, right, left, bottom]. However the BoundarySides has an __iter__ method and can be treated as the previous list except that it does not allow to modify/assign list element values.

Ongoing bug in get_geostationary_bounding_box_in_proj_coords(nb_points)

  • The function does not return the number of nb_points defined as argument
  • I temporary added a fix to deal with the changing number of vertices returned
  • I provide here below a MRE:
 import xarray as xr
from pyproj import CRS, Proj
from shapely.geometry import Polygon
import pyresample
from pyresample import geo_filter, parse_area_file
from pyresample import AreaDefinition
from pyresample.geometry import _get_geostationary_bounding_box_in_lonlats, get_full_geostationary_bounding_box_in_proj_coords
    
def create_test_area(crs, width, height, area_extent, **kwargs):
    """Create an AreaDefinition object for testing."""
    args = (crs, (height, width), area_extent)
    args = (crs, width, height, area_extent)
    attrs = kwargs.pop("attrs", {})
    area_id = attrs.pop("name", "test_area")
    args = (area_id, "", "") + args
    area = AreaDefinition(*args, **kwargs)
    return area


def truncated_geos_area():
    """Create a truncated geostationary area."""
    projection = {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos',
                  'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
    area_extent = (5567248.0742, 5570248.4773, -5570248.4773, 1393687.2705)
    width = 3712
    height = 1392
    geos_area = create_test_area(projection, width, height, area_extent)
    return geos_area


geos_area = truncated_geos_area()
self = geos_area
vertices_per_side = 20


lons, lats = _get_geostationary_bounding_box_in_lonlats(self, nb_points=vertices_per_side)
lons.shape # 11 points ... not 20 !
lats.shape # 11 points

lons, lats = _get_geostationary_bounding_box_in_lonlats(self, nb_points=50)
lons.shape # 22 points  ... not 50 !
lats.shape # 22 points 

# The problem arise in get_geostationary_bounding_box_in_proj_coords
x, y = _get_geostationary_bounding_box_in_lonlats(self, 50)
x.shape # 22 ... ... not 50 !
geo_p = Polygon(np.vstack((x, y)).T)

# The nb_points is just used to define the nb_points of the full disc boundary coords
x, y = get_full_geostationary_bounding_box_in_proj_coords(self, 50)
x.shape # 50 
geo_full = Polygon(np.vstack((x, y)).T)

# This is what happen within get_geostationary_bounding_box_in_proj_coords
x, y = get_full_geostationary_bounding_box_in_proj_coords(self, 50)
ll_x, ll_y, ur_x, ur_y = geos_area.area_extent

geo_bbox = Polygon(np.vstack((x, y)).T)
area_bbox = Polygon(((ll_x, ll_y), (ll_x, ur_y), (ur_x, ur_y), (ur_x, ll_y)))
intersection = area_bbox.intersection(geo_bbox)
try:
    x, y = intersection.boundary.xy  # here random number of points (not granted that is even !)
except NotImplementedError:
    return [], []
return np.asanyarray(x[:-1]), np.asanyarray(y[:-1])

@ghiggi ghiggi marked this pull request as draft October 10, 2023 21:42
pyresample/geometry.py Outdated Show resolved Hide resolved
@ghiggi ghiggi marked this pull request as ready for review October 11, 2023 11:18
@ghiggi
Copy link
Contributor Author

ghiggi commented Oct 11, 2023

@mraspaud this PR is ready for review. There are 3 test units failing related the gradient.__init__ code ... but I can't manage to find the root cause because of the complexity of the code. Can you start to review it and try to see what is going on?
Or alternatively, can you point me on how you debug such test units interactively? Thanks in advance !

pyresample/geometry.py Outdated Show resolved Hide resolved
@djhoese djhoese added enhancement backwards-incompatibility Causes backwards incompatibility or introduces a deprecation labels Oct 18, 2023
@djhoese djhoese self-assigned this Oct 18, 2023
pyresample/geometry.py Outdated Show resolved Hide resolved
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm against anything that you've done here, but it makes me nervous to change some of these methods that have been around for so long. We'll need to triple check that Satpy or maybe even pyspectral isn't importing/using some of these methods and functions. I do agree that you've made things cleaner and this probably puts us in a good place to refactor this into a separate helper class in the future.

You asked about the geos boundary lonlats methods and yeah I'm not sure if that makes a ton of sense for geostationary bounds, but if there is an option for the user to get the x/y bounds then yeah it might make sense...oh wait, for non-full disk geostationary areas the lon/lats wouldn't be NaNs. You mentioned get_boundary_lonlats being used in resampling, could you point to where exactly that is?

I don't think the geostatonary bounding box functions need to be made private, but if you feel that it saves us headaches in future refactorings them I'm OK with it (as long as they aren't used in other pytroll packages).

Lastly, I just had an idea perhaps just for future interfaces: what if all of these functions that have _lonlats and _proj_coords variants of each other had just a single version of the function that returned a single object and that object had properties (or methods) for getting access to the x/y, lon/lat, or image coordinate versions of some of the information? You'd end up with something like:

my_area.get_boundary(vertices_per_side=20).get_lonlats(chunks=1000)

Or something like that.

pyresample/geometry.py Outdated Show resolved Hide resolved
@ghiggi
Copy link
Contributor Author

ghiggi commented Oct 19, 2023

I try to answer to all the elements of discussion point by point here below.

Geostationary x/y sides

  • Sides order: [top, right, bottom, left]
  • Currently, for all geostationary area we have "dummy" right and left sides of shape 2
  • I recognize that for geostationary areas, we should implement a function that check if the image has nan edge values (out of Earth disk pixels). If some edge pixels are out of disk, we retrieve the geo sides with the current geo routine (ending up with dummy right/left sides, otherwise we use the classical approach of selecting the sides !

get_geo_<utility> functions

To be safe, I will keep the geostationary utility function as public.
However, I checked ... and satpy, pyspectral and the pyorbital packages does not use any of these functions.
In satpy, in the satpy/readers/utils.py there a some utility to extract specialized geostationary area information from AreaDefinition objects.
Some functions looks quite old code like get_geostationary_bounding_box, _lonlat_from_geos_angle.
Maybe could be useful, in a future PR, to move all geostationary utilities in a single separate module in pyresample.

Future Boundary Interface

  • If we want to have _lonlats and _proj_coords functions, it means there we would need a boundary class for AreaDefinition and one for SwathDefinition (without _proj_coords) objects.
  • Right now, from the AreaBoundary object (returned by area.boundary) we have the properties vertices, lons, lats, sides_lon, sides_lat.
    We also have the method lons, lats = boundary.contour(). This method could be made a property !
    We could add the property lons, lats = boundary.lonslats but I looks a bit strange to me.
    When I will finalize the new spherical interface, I would like to add the to_line and to_polygon methods.
  • As a personal a taste, I find more readable code without too much _get_<stuffs>. _get_<stuffs> … and I guess most of relevant stuffs can be defined as properties.
  • Regarding the chunks option, are you sure that is needed when retrieving the lon/lat edges? One can always .rechunk or dask.array.from(array, chunks)

Boundary Sides Consistency

  • The order of sides can be non consistent !
    Here we clearly mention that sides are defined from upper left in clockwise order: [top, right, bottom, left].
    However, I noticed that when we extract a boundary, if the original sides are not clockwise, and we force_clockwise=True, then the side position currently returned by get_bbox_lonlats become [left, bottom, right, top] (see here)I think this should be fixed ... as it's prone to introduce bugs (I don't tell you the time wasted to discover this point when implementing the spherical tools for my projects ... ).
  • I believe that the clockwise/anticlockwise reordering should be applied only when returning the contours/vertices/lons/lats in the boundary object and not at the sides list definition in get_bbox_lonlats !
  • We could add to AreaBoundary the method set_clockwise() and set_anticlockwise(), which regulates the order of the returned contours/vertices/lons/lats arrays.
  • For clarity, we could also rename get_bbox_lonlats as get_bbox_sides. Across the repo, I often see lons, lats = self.get_bbox_lonlats which I interpret as the lons / lats arrays, while instead are the lon_sides, lat_sides list !
  • Regarding the boundary sides, it’s not always clear to me when I see such name, what is the inner structure of the object.
    Most often is [side1, side2, side3, side4] with side = (lon, lat), sometimes we have sides_lon and sides_lats (i.e. in AreaBoundary.
  • Maybe we could create a BoundarySides class … so that we can store and extract easily the information.
    We could have for example the properties: sides.top, sides.bottom, sides.left, sides.right, sides.vertices, sides.lons, sides.lats
    Note that in the code we often use the terms side1, side2, side3, side4
    The AreaBoundary methods could extract and modify the BoundarySides instance of the class.
  • Currently the SimpleBoundary object just accept the sides of a single array of coordinates (lat or lon) and is only used by get_boundary_lonlats. The class has no methods and just the properties side1, side2, side3, side4.

get_boundary_lonlats and SimpleBoundary usage

  • Return two objects: the longitude and latitudes SimpleBoundary objects.
  • From each SimpleBoundary object, which we can extract the side1, side2, side3, side4.
  • The function is used in pykdtree.py, bilinear.py and gradient.py. See here the search result
  • In pykdtree.py and bilinear,py is just used to call the get_valid_index_from_lonlat_boundaries method. And if we look at the code of get_valid_index_from_lonlat_boundaries, it just extracts the lon and lat of each side to pass the single arrays to another function. Thus could be easily refactored with the new interface.
  • In gradient.py, the method is just used to retrieve the lon, lat arrays. Could be replaced by lon, lat = area.boundary().contour()
  • Note also, that in data_reduce.py, the function swath_from_lonlat_boundaries expects as input the two SimpleBoundary objects, but it’s not used at all across the codebase (neither in satpy). Should be removed?
  • So I think we could easily remove/deprecate SimpleBoundary and the use of get_boundary_lonlats. SimpleBoundary is just used in the above described cases, and unused in satpy, pyorbital, etc.

Conclusion
Lot of stuffs to digest. I guess any of these points can be addressed in a separate PR 😄
In meanwhile, I will take care of your review comments.
Let me know your opinion for the bug (and MRE) provided in the PR description, and then let’s try to understand the cause of the failing test in gradient.py; I struggle to debug such test units ... and I look forward for your feedbacks 😄

@djhoese
Copy link
Member

djhoese commented Oct 19, 2023

If we want to have _lonlats and _proj_coords functions, it means there we would need a boundary class for AreaDefinition and one for SwathDefinition (without _proj_coords) objects.

I'm not so sure. I think it is one of the failings of our past understandings that all latitude/longitude degrees are the same. That is just not true. There can be prime meridian shifts or different Earth models/datums used. So there is some idea of "native" CRS and a preferred geographic/geodetic CRS. I could easily see a helper object (as I mentioned in my previous comment) that gets you x or y or lon or lat, but allows you to convert that object to another version with a different preferred geodetic CRS.

As a personal a taste, I find more readable code without too much get. get … and I guess most of relevant stuffs can be defined as properties.
Regarding the chunks option, are you sure that is needed when retrieving the lon/lat edges? One can always .rechunk or dask.array.from(array, chunks)

Is it the get that you don't like? Or the many small methods/functions? I think hiding some of this functionality in helper classes makes it irrelevant. I also think in a lot of cases we could get rid of the get, but I'm fearful of turning most things into properties ever since we started playing with dask/xarray. With dask some of these methods end up being able to produce dask arrays and there is a real benefit performance-wise of knowing/defining what that chunk size is from the creation point versus re-chunking after.

I'm sure there are some use cases where we define polygons or boundaries where this idea of needing to define dask arrays is unnecessary, but I'd want to be careful. In a lot of cases I don't think I'd want to generate a polygon and then not immediately use it. However, that polygon may be generated from large lon/lat arrays (this is the worst case scenario I think of all the scenarios) and in that case there may be a benefit to computing the resulting polygon at the same time as all other computations (if possible) to reuse the compute lon/lat arrays (dask would share the result of the tasks). But that being the worst case maybe we just "deal with it". Lon/lat arrays are already the slowest form of computation we use.

However, I noticed that when we extract a boundary, if the original sides are not clockwise, and we force_clockwise=True, then the side position currently returned by get_bbox_lonlats become [left, bottom, right, top] (see here)I think this should be fixed ... as it's prone to introduce bugs (I don't tell you the time wasted to discover this point when implementing the spherical tools for my projects ... ).

I'm not sure I agree. "top" and "bottom" are relative. Is the top of the swath the last scan of the data or the first? Is it the most northern? What if we have a non-polar orbiting satellite? If the sides are always generated from a polygon maybe that will always be more clear (shapely or other library determines what that means)?

I believe that the clockwise/anticlockwise reordering should be applied only when returning the contours/vertices/lons/lats in the boundary object and not at the sides list definition in get_bbox_lonlats !

I'd have to check how I used this method in my Polar2Grid project. It is likely that I put it in get_bbox_lonlats because that is used internally in the area boundary class (if I recall correctly). But also, other polygon libraries probably need clockwise coordinates.

We could add to AreaBoundary the method set_clockwise() and set_anticlockwise(), which regulates the order of the returned contours/vertices/lons/lats arrays.

I don't think it is that people need the counter-clockwise (CCW) version of the coordinates, it is that the polygon/boundary math requires the clockwise (CW) version. Besides wasted performance I don't think I see a reason to never not return CW.

it’s not always clear to me when I see such name, what is the inner structure of the object.

Yep. Could probably return a namedtuple at the very least. This was just old interfaces before anyone knew what they were doing I think.

...Ah I just read your BoundarySides idea. Same thing. If it is a full class instead of a namedtuple then I might suggest we try using __slots__ to keep the "weight" of the class small.

Conclusion

I think in general I like where you're going, but I still think we're 1-2 steps away from a design that really "wows" me. I think the idea of something like an area_def.boundary.left.lonlats is exciting as it opens up a usage pattern that works for other use cases too.

@djhoese
Copy link
Member

djhoese commented Oct 19, 2023

For this failing test:

def test_resample_swath_to_area_2d(self):
"""Resample swath to area, 2d."""
data = xr.DataArray(da.ones(self.src_swath.shape, dtype=np.float64),
dims=['y', 'x'])
with np.errstate(invalid="ignore"): # 'inf' space pixels cause runtime warnings
res = self.swath_resampler.compute(
data, method='bil').compute(scheduler='single-threaded')
assert res.shape == self.dst_area.shape
assert not np.all(np.isnan(res))

Can you print out self.swath_resampler.coverage_status? My guess is this method is filtering out all chunks of the source data:

for i, covers in enumerate(self.coverage_status):
if covers:
if is_src:
y_start, y_end, x_start, x_end = self.src_slices[i]
else:
y_start, y_end, x_start, x_end = self.dst_slices[i]
try:
val = data[:, y_start:y_end, x_start:x_end]
except IndexError:
val = data[y_start:y_end, x_start:x_end]
else:
val = None
data_out.append(val)

And then the list passed to np.max is empty.

That coverage_status is filled here:

def get_chunk_mappings(self):
"""Map source and target chunks together if they overlap."""
src_y_chunks, src_x_chunks = self.src_x.chunks
dst_y_chunks, dst_x_chunks = self.dst_x.chunks
coverage_status = []
src_slices, dst_slices = [], []
dst_mosaic_locations = []
src_x_start = 0
for src_x_step in src_x_chunks:
src_x_end = src_x_start + src_x_step
src_y_start = 0
for src_y_step in src_y_chunks:
src_y_end = src_y_start + src_y_step
# Get source chunk polygon
src_poly = self._get_src_poly(src_y_start, src_y_end,
src_x_start, src_x_end)
dst_x_start = 0
for x_step_number, dst_x_step in enumerate(dst_x_chunks):
dst_x_end = dst_x_start + dst_x_step
dst_y_start = 0
for y_step_number, dst_y_step in enumerate(dst_y_chunks):
dst_y_end = dst_y_start + dst_y_step
# Get destination chunk polygon
dst_poly = self._get_dst_poly((x_step_number, y_step_number),
dst_x_start, dst_x_end,
dst_y_start, dst_y_end)
covers = check_overlap(src_poly, dst_poly)
coverage_status.append(covers)
src_slices.append((src_y_start, src_y_end,
src_x_start, src_x_end))
dst_slices.append((dst_y_start, dst_y_end,
dst_x_start, dst_x_end))
dst_mosaic_locations.append((x_step_number, y_step_number))
dst_y_start = dst_y_end
dst_x_start = dst_x_end
src_y_start = src_y_end
src_x_start = src_x_end
self.src_slices = src_slices
self.dst_slices = dst_slices
self.dst_mosaic_locations = dst_mosaic_locations
self.coverage_status = coverage_status

But I'm just using what I get from reading the code. I didn't code this so I could be completely missing something.

@ghiggi
Copy link
Contributor Author

ghiggi commented Oct 19, 2023

Just to better clarify a couple of thoughts @djhoese :

I'm not sure I agree. "top" and "bottom" are relative. Is the top of the swath the last scan of the data or the first? Is it the most northern? What if we have a non-polar orbiting satellite? If the sides are always generated from a polygon maybe that will always be more clear (shapely or other library determines what that means)?

I don’t refer to the orientation in terms of geographic space, but with respect to the 2D lon/lat arrays. The [ŧop, right,bottom, left] sides of the coordinate arrays.

I'd have to check how I used this method in my Polar2Grid project. It is likely that I put it in get_bbox_lonlats because that is used internally in the area boundary class (if I recall correctly). But also, other polygon libraries probably need clockwise coordinates.

Yes the AreaBoundary class (at least in pyresample) does not correct for clockwise/anticlockwise ordering.
The force_clockwise reording was applied in get_bbox_lonlats because the outputs of this function were concatenated to get directly the lon/lat boundary vertices with the correct orientation.
But if we speak of the sides coordinate of an array, clockwise/anticlockwise order make no sense. Is the vertices derived from the sides that must follow the clockwise/anticlockwise assumption.
Right now, doing force_clockwise=True for i.e. retrograde orbits (going from east to west) scanning forward (or prograde orbits scanning backward), cause the output sides to be the [left, bottom, right, top] sides of the coordinates/image arrays.

I don't think it is that people need the counter-clockwise (CCW) version of the coordinates, it is that the polygon/boundary math requires the clockwise (CW) version. Besides wasted performance I don't think I see a reason to never not return CW.

If one want to generate Shapely polygons, it requires CCW order. If the area wraps the pole, we must not modify the ordering because the order determines which pole is wrapped in.

@djhoese
Copy link
Member

djhoese commented Oct 19, 2023

Oh...so our boundary logic (or spherical geometry stuff) needs CW and shapely uses CCW? Ok so there should be an option.

So would/could/should there be a "sides" method on a boundary/polygon or should it be on the area? If there is a "sides" method on the area (and swath) then I think the equivalent of a force_clockwise needs to be on that method, right? Otherwise what if someone wants to do their own polygon math without shapely?

# Conflicts:
#	pyresample/geometry.py
@djhoese
Copy link
Member

djhoese commented Nov 22, 2023

Ok @ghiggi I've merged this with main which includes my pre-commit changes. There was only one conflicting file on the merge and that was in geometry.py. The biggest conflict was actually just unfortunate where I moved the "edge" method from BaseDefinition to AreaDefinition right under the boundary sides methods and git was confused when it tried to merge them even though they were separate. Otherwise, the area slicing methods/functions are now in _subset.py including _get_area_boundary which you had made a change to and I made sure that was migrated over to that.

Otherwise, my merge just includes a couple things to make pre-commit happy (docstring fixes, indentation/spacing fixes, unused import after it included your changes, etc).

Let me know if you have any problems with this or anything looks confusing. I should be online at my usual times tomorrow.

@djhoese
Copy link
Member

djhoese commented Nov 24, 2023

Very nice job. I re-read your updated PR description and I really like the goal you've set for yourself. It makes a lot of sense and I'm glad someone is trying to tackle the problems and inconsistencies.

I think my general overall concern with the changes you've made so far as that I don't like the separation between projections and geographic versions of things. I will admit I may have a misunderstanding of this, but I don't think so. To me, geographic "projections" are not a single thing. They can differ with their model of the Earth and the datum/reference coordinates used for that Earth. As you know we could also shift the prime meridian of a lon/lat CRS and it is still considered geographic.

With the above in mind, I think I'd like to look at combining the different boundary classes into a single class. Another thing that leads me to this idea is that you point out that the two classes differ in their .x/.lons, .y/.lats, and .sides_X properties. But with my understanding in mind .x == .lons and the same for the other properties. And if the CRS becomes a required parameter maybe that's OK?

I'm also not a fan of the "clockwise" behavior of the boundary classes. Specifically the .set_clockwise() and .set_counterclockwise() methods. These seem like they should either:

a. Return a new instance of the boundary with this behavior enforced/applied.
b. Make it a keyword argument to the class maybe?
c. Leave it as-is maybe?

It just feels odd to me, but I'm not sure I have any better suggestions at this point.

Wild suggestion: Are there enough deprecations and changes here that this should be moved to a single .boundary() method on the future geometry definitions?

@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 24, 2023

@djhoese I also don't like the separation between projections and geographic versions of things, but it seems to me required when dealing with boundaries.

Here we are not speaking of geographical or projected CRS , but the presence of two types of coordinates: spherical and planar ! Maybe we should call the classes SphericalBoundary and PlanarBoundary !

For SwathDefinition, we can only define a SphericalBoundary.
For AreaDefinition with CRS with planar coordinates, we can define SphericalBoundary and PlanarBoundary.
For AreaDefinition with CRS with geographic coordinates, we can only define SphericalBoundary.

I try to elaborate here below some additional thoughts:

  • For (projection) planar coordinates, I can use shapely for geometry operations
  • For (geographic) spherical coordinates, I am on a sphere and I need to use spherical operations
  • For an AreaDefinition, when extracting the boundary sides projection coordinates (i.e. in meter)
    and the equivalent lat/lon coordinates on the sphere I might end up with the case where:
    • the concatenation of the area sides with the projection coordinates leads to clockwise vertices
    • the concatenation of the area sides with the geographic coordinates leads to counterclockwise vertices
      --> From there one of the reason to define 2 separate objects
  • The boundary sides refers just to the side position (top, right, bottom, left) of the coordinate array.
  • When we create the *Boundary class, I check what is the initial order of the concatenated sides.
    set_clockwise()and set_counterclockwise() does not modify any of the side data,
    but set an internal flag that is used to choose if the concatenated sides must be reversed when
    calling x, y, lat, lon, contour or vertices methods.
  • I defined the *Boundary class to have a order argument defaulting to None, which means
    return boundary vertices as the order they are. The user can specify order="clockwise" or order="counterclockwise"
    when creating the class, or by calling set_clockwise() and set_counterclockwise() later on, but this does not modify
    any internal data of the boundary sides and does not trigger any computation !

Side note: there was already a method boundary() I created 2-3 months ago that returned an AreaBoundary (geographic coordinates)
that now I deprecated in favour of geographic_boundary which is returning the GeographicBoundary.
Maybe we can keep the .boundary(), make it return GeographicBoundary (instead of AreaBoundary) and remove the proposed geographic_boundary method if you are willing to accept that a method return a class with a different name (GeographicBoundary has all methods of AreaBoundary)

@djhoese
Copy link
Member

djhoese commented Nov 24, 2023

Ah the Spherical versus Planar makes this make more sense now.

For an AreaDefinition, when extracting the boundary sides projection coordinates (i.e. in meter)
and the equivalent lat/lon coordinates on the sphere I might end up with the case where:
the concatenation of the area sides with the projection coordinates leads to clockwise vertices
the concatenation of the area sides with the geographic coordinates leads to counterclockwise vertices
--> From there one of the reason to define 2 separate objects

I don't understand why the type of coordinate retrieved (projection or geographic) would effect clockwise or counterclockwise. Don't we have enough cases of swaths (ascending and descending orbits or other old instrument types) and areas (north-up areas, south-up like FCI, and south-up/east-left like SEVIRI) where any coordinate combination could give you any orientation/ordering?

Besides the cases of what data we started with, what are cases where we need to use spherical boundaries even if we have a projected/planar area? Also, are there cases where we have geographic boundaries but can assume they don't cross the anti-meridian and could therefore be treated as planar coordinates? For example, a mercator or eqc projected dataset is not technically valid if it crosses the anti-meridian (it goes outside the coordinate space of the projection) so there is no need to treat it as a sphere. Similarly, if it does cross the anti-meridian in lon/lat space, but we shift the prime meridian we could (in some cases that don't cross the poles) treat the lon/lat coordinates as planar too.

I think I've realized one of the main reasons I don't like set_clockwise is that it puts state into the object and makes it mutable. It really feels like these objects should be immutable and only be containers for the boundary information and not modifiers of it. I realize that the clockwise wish property is not modifying the data until it is returned, but it causes confusion when using it. I would almost feel better if the boundary object knew the order/orientation of the data (either told by the user as a kwarg or determined by the class when needed) and besides automatically converting for things like shapely object creation, it could probably methods for forcing it that return a new copy of the boundary with the modifications done already. If the orientation/order is the same as the Boundary already then return self. If not, convert and return a new object.

This also makes me think, could the pyresample spherical stuff be made to detect the closed/not-closed state of the polygon provided to it and adjust as needed? That way there doesn't need to be special handling...eh I guess this is a nice convenience for any user to have that ability to get closed or open boundary coordinates.

@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 24, 2023

Thansk for all these discussions @djhoese. They also help me to completely free out my thoughts.
Answering to your questions:

Besides the cases of what data we started with, what are cases where we need to use spherical boundaries even if we have a projected/planar area?

All pykdtree based stuffs currently exploit spherical boundaries for data reduction (see contour_poly in get_area_slices) !
Only gradient use planar boundaries (and fails then in many cases)

Also, are there cases where we have geographic boundaries but can assume they don't cross the anti-meridian and could therefore be treated as planar coordinates?

Yes many cases. With this PR, we will be in a good place to perform data reduction, on the sphere or with shapely, based on the type of source and target area.
We need to perform operation on the sphere if we have:

  • a swath crossing the antimeridian
  • a swath overpassing a pole,
  • a projection area enclosing the pole
  • a projection area crossing the antimeridian

And we need to stay on the sphere if the two areas CRS bounds does not intersect !

I think I've realized one of the main reasons I don't like set_clockwise is that it puts state into the object and makes it mutable.

I agree with that and I strongly push to deprecate the existing use of force_clockwise.
The order should just be set when creating the shapely Polygon or the spherical Polygon depending on the software requirements.
In this PR, if set_*clockwise is not called, it returns the boundary vertices as they appear in the coordinate array (top, right, bottom, left)

This also makes me think, could the pyresample spherical stuff be made to detect the closed/not-closed state of the polygon provided to it and adjust as needed?

That's just requires to check if the first vertex is equal to the last

@djhoese
Copy link
Member

djhoese commented Nov 25, 2023

We mentioned this on slack, but if backwards compatibility is limiting this design at all then I am very in favor of setting up as a boundary method in the future geometries that use this "perfect" boundary class (or classes if needed). The existing/legacy/old geometry definitions could use a special subclass of these future boundary classes that does something like:

from .future.boundary import Boundary as FutureBoundary

class _OldBoundary:
    ...

class LegacyBoundary(FutureBoundary, _OldBoundary):
    ...

That is, the boundary used by the existing AreaDefinition would be LegacyBoundary which is a combination of the FutureBoundary and the old boundary that you didn't want to hold on to just for backwards compatibility. That way when we switch to Pyresample 2.0 we get the "best" design instead of having to still deal with workarounds and hacks just to deal with backwards compatibility.

All pykdtree based stuffs currently exploit spherical boundaries for data reduction (see contour_poly in get_area_slices) !

For some reason I was sure that the spherical stuff was doing stuff in XYZ geocentric coordinate space, but you're right it is all lon/lat. Interestingly the SphPolygon computes the X/Y/Z coordinates but then doesn't ever use them from what I can tell:

self.cvertices = np.array([np.cos(self.lat) * np.cos(self.lon),
np.cos(self.lat) * np.sin(self.lon),
np.sin(self.lat)]).T * radius
self.x__ = self.cvertices[:, 0]
self.y__ = self.cvertices[:, 1]
self.z__ = self.cvertices[:, 2]

Lastly, about inheritance versus composition/dependency injection: These classes seem very similar and almost not different at all. Unless you have a strong argument against it, it really seems better to require a crs kwarg associated with the provided boundary coordinates/vertices and then do a few if statements to separate out the if self.crs.is_geographic logic when needed. Otherwise, a "composition" style implementation would be something like doing:

class Boundary:
    def __init__(self, ...):
        if self.crs.is_geographic:
            self._polygon_helper = GeographicPolygonHelper(self.vertices, self.crs)
        else:
            self._polygon_helper = PlanarPolygonHelper(self.vertices, self.crs)

    def to_shapely_polygon():
        return self._polygon_helper.to_shapely_polygon()

Now of course I haven't done all of the necessary checks about how the code differs, but I wanted to give an example of what I was thinking/talking about when mentioning these topics.

I'd also like to rethink this idea of passing the type of polygon class to generate with a string keyword argument. There are a lot of design patterns that exist that could help us with this, but at the moment I don't have time to dive into them. This could easily be one of the last decisions made about this PR so not something we need to do right now.

@@ -250,7 +251,8 @@ def get_lonlats(self, data_slice=None, chunks=None, **kwargs):
# lons/lats are xarray DataArray objects, use numpy/dask array underneath
lons = lons.data
lats = lats.data

# TODO
# --> if data_slice and chunks provided, why here first rechunk all array and then subset?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we can increase a bit performance ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data_slice case is not really utilized in Satpy so the chunking was implemented separate from that. The idea is/was that the chunking provided by the user is to match some other array (the band data from a Satpy reader for example). I suppose the chunking could be done after...but I'm scared of edge cases and unexpected consequences. If you know of a chunks + data_slice case coming from Satpy please let me know and point it out.

lon_b = np.concatenate((lons.side1, lons.side2, lons.side3, lons.side4))
lat_b = np.concatenate((lats.side1, lats.side2, lats.side3, lats.side4))

vertices_per_side = 3600
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exagerated and really slow down stuffs! Why has been set so large ?
If it was used to avoid cutting out portions of GEO area ... now I fixed the underlying bug !!!

"""Create an AreaDefinition object for testing."""
area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs)
return area

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that pytest fixture has a bug with nested call when passed to pytest mark parametrize.
Instead of using the fixture defined in conftest, this enable to test stuffs. Maybe @djhoese you want to check it out ... I spend 3-4 hours ... and did find a way to fix it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.

@ghiggi ghiggi requested review from mraspaud and djhoese November 27, 2023 23:11
@ghiggi
Copy link
Contributor Author

ghiggi commented Nov 27, 2023

The PR is now ready for review @djhoese @mraspaud. I again updated the PR description with the last changes.
I finished to implement the interfaces and the code now it ensure the correct ordering of vertices for SwathDefinition and AreaDefinition, which in turn enable to correct creation of SphPolygon and shapely Polygon.
I also added code to retrieve boundary of AreaDefinition projections with infinite values at the edges because of out-of-Earth locations (i.e. polar projections, global projections like Robinson, Mollweide, etc).

  • I checked against all cartopy projections and it works perfectly (except from a couple of projections because of a PROJ bug).
  • If we force the computation of such boundaries (requires reading the entire coordinates array in memory), some tests of slicer and gradient fail, because we did not yet implement the logic to deal with such projections. To this end I introduced the force_boundary_computations=False config to make the current workflow running ... and enable to adapt the pipeline in future separate PRs.

I now wait for your feedbacks before implementing the last test units. I also plan to still add a commit to enable to pass a tuple of vertices_per_side.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for putting in all this work. However, this is WAY too big of a pull request. The last I saw of this was extracting some boundary logic and reworking some new boundary classes. Now you've added a visualization subpackage and deprecated everything boundary or "sides" related? Very very scary and a lot of work for one PR.

I made a lot of comments to hopefully guide our discussion of this in the future. I skipped over all the tests except your one question about fixtures in test_area.py. I also ignored a lot of the boundary sides changes in geometry.py as it looked very complex and not necessarily complete or completely cleaned up.

correct ordering of vertices

What does "correct" mean? If the word "correct" is mentioned in docstrings anywhere it should be explained.

I also plan to still add a commit to enable to pass a tuple of vertices_per_side.

I wouldn't worry about it unless it is needed in this PR. Make an issue as a reminder if necessary.

We could use getattr to handle deprecations: https://peps.python.org/pep-0562/

I've mentioned it before and in multiple comments, but the more and more you change the more obvious it is that this work should be moved to the future geometry classes. At the same time, while I'm sure Martin and I are not giving you the rapid type of feedback you'd like, I'd really like more discussion before you spend hours or days working on a complete rewrite of some of this stuff. For example the test_area.py fixture issue, you mentioned you spent 3-4 hours on that. If it was the problem I mention in my comment that should have been a quick question and quick answer on slack. Maybe Martin and I don't have a good enough grasp on your vision for this stuff or on the existing boundary logic to give you the discussion you need or expect, but a PR this size (the second or third of yours like this) needs to consist of more discussion and less large scale rewrites and deprecations.

Some of your more recent changes seemed to stem from trying to get the tests to pass, but besides your own TDD tests that may be leading your development, I'm not sure this is needed until we've all agreed on the design/interfaces/layout. Otherwise you risk wasting your time rewriting these tests multiple times.

My overall request for changes on this are:

  1. Move changes to future geometry classes where every possible. Utilize AreaDefinition.__getattr__ if needed to prevent access to the deprecated methods. In a future PR of mine I will remove the "LegacyAreaDefinition" subclass of the future AreaDefinition and only access legacy methods through __getattr__ and warnings/errors.
  2. Combine some of this logic if at all possible. As mentioned elsewhere, the Boundary classes seem very close to be a single class with smart tricks for handling the special cases or delegating to pyproj's CRS classes for the information they seek. Similarly, I noticed some try/excepts or if/else in various places in this PR that "feel" like they should be an internal decision at a lower-level of the code or a decision at the higher level...some sort of abstraction. I see None used in a couple places where a more concise and intentional solution seems possible (or at least better defined and documented).

__all__ = [
'grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE',
'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml',
"_root_path",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this listed in __all__. The all variable is meant to deal with from pyresample import *. No need for _root_path to be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I didn't know the purpose of __all__ was for that :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this topic, from pyresample import *is not recommended in python this a while, should we just remove __all__?

@@ -56,8 +56,13 @@

from .version import get_versions # noqa

__all__ = ['grid', 'image', 'kd_tree', 'utils', 'plot', 'geo_filter', 'geometry', 'CHUNK_SIZE',
'load_area', 'create_area_def', 'get_area_def', 'parse_area_file', 'convert_def_to_yaml']
_root_path = os.path.dirname(os.path.realpath(__file__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pathlib import Path
pkg_root_path = Path(__file__).resolve().parent

And is this used outside of the test modules? If not, I'd prefer this go in pyresample/test/__init__.py or maybe in conftest.py but I think it'd have to be a fixture to be useful there and that probably isn't worth it. You'd have to do the right amount of .parent.parent to get the correct directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -40,6 +40,7 @@
"features": {
"future_geometries": False,
},
"force_boundary_computations": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this it will need documentation, but that can be the last thing in this PR.

Comment on lines +607 to +608
except Exception:
valid_indices = np.ones(source_lons.size, dtype=bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was done to make tests pass? What's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @djhoese. This try-except is used to deal with out of Earth disk projections !!!
Before this PR, the boundary was created using get_boundary_lonlats (see

def get_boundary_lonlats(self):
). The method was just naively taking the image border lon/latcoordinates.
So for the geostationary full disk and other projections with out-of-earth disk at the border, the boundary sides were all Inf. And inside the .get_valid_index_from_lonlat_boundaries if there was any invalid side coordinate, it was returning an array of ones (no data reduction done !). This means that for full disc GEO projections we were not doing data reduction !!!

In this PR, I replaced get_boundary_lonlats with target_geo_def.boundary().sides. So:

  • if the area is geostationary, now we finally reduce data (and this should speed up stuffs !!!)
  • if the area boundary has out-of-Earth locations and the flag force_boundary_computations is True, the boundary method will retrieve the actual boundary coordinate inside the area, and we will reduce data ...
  • if the area boundary has out-of-Earth locations and the flag force_boundary_computations is False (to keep the old behaviour), the boundary method now fails because if all sides are Inf it will raise a ValueError within _filter_bbox_nans. If this happen, we return what we returned before: the np.ones array

pyresample/boundary/sides.py Show resolved Hide resolved
Comment on lines +285 to +286
warnings.warn("'get_boundary_lonlats' is deprecated. Please use "
"'area.boundary().sides'.", DeprecationWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of deprecation and all others like it seem like something that should be done in a future version of the geometry. This is perhaps too large of a change to request users make while still calling it a 1.x release...or is this method relatively new?

coordinates="geographic")
# Polar Projections, Global Planar Projections (Mollweide, Robinson)
# - Retrieve dummy right and left sides
if config.get("force_boundary_computations", False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? What is being forced? How does this result differ from when it isn't forced/computed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config regulates the behaviour when an AreaDefinition has boundaries on out-of-Earth regions.

force_boundary_computations == False keep the old behaviour of not searching for the actual internal boundary cooordinates, but just select the coordinates at the image border.
For global projections, (i.e. Mollweide, Robinson, ...), polar projections, ... :

  • if all image sides are out of Earth, the sides would only have Inf values and a ValueError is raised further downstream if all values are Inf.
  • If some of the image sides are out of Earth, some sides values would be Inf, such Inf value are discarded further downstream in the processing. If not all values are Inf, a "partial boundary" is returned, however it does not represent the true Earth boundary of the area.

With force_boundary_computations == True, we actually search the actual internal coordinates of the boundary.
To do so, we need to compute in memory one the coordinates array , from here the name force_*_computations.

See my other comment answering to the reason of the try-except logic to understand the rationale of this choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'll keep this discussion on this thread instead of the 3 other comments I made related to the same topic.

Is another way of seeing this as "produce the best possible boundary possible or do the quick/shortcut method"? Where the quick/shortcut is the old way, right? Sorry if I'm getting confused, but this only applies to generating the spherical/lonlat boundaries right? The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right? Should we include swaths in this current discussion or wait to confuse me later?

Can we use information from the PROJ/pyproj CRS class (like axis bounds) if they exist to do an even faster method of this where maybe we clip the lon/lats to the defined limits of the CRS? They don't always exist...hhmm but the result is always Inf isn't it? Not -Inf and Inf...kind of hard to clip then. I could also see us hardcoding some specializations of this for specific projections like we do for geostationary. That would make things a little smarter and faster wouldn't it?

Perhaps a better name for this would be something related to doing the "extra" work to get the best answer rather than phrasing it as "forcing" something. If we can speed up the common case maybe this could default to "on"/True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include swaths in this current discussion

SwathDefinition are not impacted by force_boundary_computations flag.

force_boundary_computations only impacts AreaDefinition where Earth coordinates only lies inside an internal region of the coordinates array (i.e. geostationary ... but for geostationary we have a smart ad-hoc trick to retrieve the actual internal Earth boundary coordinates without loading the coordinates array into memory).

I looked deeply into PROJ/pyproj CRS and I didn't find another way to determine the actual valid internal coordinate boundary without the trick I implemented in the PR.
Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.

This only applies only applies to generating the spherical/lonlat boundaries right?

  • force_boundary_computations=True is required to perform correct spherical polygon operations
  • force_boundary_computations=True will likely fix issues in get_area_slices (because of previous bad polygon intersections)
  • force_boundary_computations=True speeds up some pykdtree operations when reduce_data=True. Currently resampling geostationary full disc (i.e. with nearest) using reduce_data=True does not reduce the data (because the boundary returned is all Inf ...)

The x/y for AreaDefinitions should always be valid in terms of the x/y coordinates, right ?

  • force_boundary_computations=True is also used in the area.projection_boundary method and impacts the gradient resampling, where data reduction is (assumed to be done) in planar projection coordinates. area.projection_boundary return the x/y coordinates corresponding to Earth valid coordinates, and so enable the transform of i.e. the actual internal GEO boundary to another CRS --> reduce_data=True with gradient was failing for GEO FD and this PR we might have solved the issue (not tested ... I didn't want to also refactor gradient logic :P)

If we can speed up the common case maybe this could default to "on"/True

I think that the boundary and projection_boundary methods should return the correct boundaries.
This will also benefit get_area_slices in some cases (remapping to polar projections was causing issues ...)

Then the decision of what should be done in pykdtree and gradient it's a choice to be based on the following points:

  • Is it fine for some special AreaDefinition projections to load the entire coordinates in memory? We go a bit away from the idea of only load the required data with dask ... but on the other side in such special cases ... we were later on loading all coordinates anyway to discover which was valid and which not ...
  • In pykdtree, we will just gain in performance and I don't expect any surprise
  • In gradient, there is somewhat lot of assumptions in how data reduction is done and it might be there will be some surprise with some projections, but on the other hand we move forward to fix several issues ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically one could design ad-hoc methods for each CRS to infer the boundary as we did for geostationary (angle + satellite height) ... but it's a huge, tedious, work.

Yeah...I figured. Shoot.

force_boundary_computations=True speeds up some pykdtree operations when reduce_data=True. Currently resampling geostationary full disc (i.e. with nearest) using reduce_data=True does not reduce the data (because the boundary returned is all Inf ...)

With reduce_data=True, this is still get_area_slices functionality, right (so the same as your previous bullet point)? I thought we had the custom geostationary boundary logic so what about it has Inf coordinates? Or are the x/y of the boundary resulting in Inf when converted to lon/lats?

Then the decision of what should be done in pykdtree and gradient it's a choice to be based on the following points:

Hhhmm this is difficult. I haven't looked at your full implementation but I'm wondering if in the dask-case (and numpy I guess), for the default, we could be smart about choosing a limited set of chunks to compute completely. Maybe it wouldn't be as perfect as the full lookup but could be really close.

Comment on lines +2890 to +2894
ll_x, ll_y, ur_x, ur_y = area_extent
bottom = [(x, ll_y) for x in np.linspace(ll_x, ur_x, nb_points + 2)]
right = [(ur_x, y) for y in np.linspace(ll_y, ur_y, nb_points + 2)][1:]
top = [(x, ur_y) for x in np.linspace(ur_x, ll_x, nb_points + 2)][1:]
left = [(ll_x, y) for y in np.linspace(ur_y, ll_y, nb_points + 2)][1:-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something numpy can do in a vectorized way (numpy array operations only)...not the most important part of this PR by far.

Comment on lines +416 to +427
# - If invalid sides, return np.ones
try:
sides_lons, sides_lats = target_geo_def.boundary().sides
# Combine reduced and legal values
valid_input_index &= \
data_reduce.get_valid_index_from_lonlat_boundaries(
sides_lons,
sides_lats,
source_lons, source_lats,
radius_of_influence)
except Exception:
valid_input_index = np.ones(source_lons.size, dtype=bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this work before? This feels like this should happen somewhere else and the except Exception needs to be figured out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment for the other try-except ;)

"""Create an AreaDefinition object for testing."""
area = AreaDefinition(crs=crs, shape=shape, area_extent=area_extent, **kwargs)
return area

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures can't be used inside parametrize. You need to use the third-party lazy_fixtures. You can see it's usage in other modules in pyresample (I think) and Satpy. It basically allows you to refer to a fixture in parametrize by it's name as a string.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extensive work!
Unfortunately, I think this PR is too big as it is and need to be split into multiple PR, if just for the reviewers sanity. My suggestion would we to have:

  • One PR for TEST_FILES_PATH
  • One PR for replacing frequency with vertices_per_side
  • One PR for refactoring the visualisation part
  • One PR for renaming AreaDefBoundary and SwathDefBoundary to PlanarBoundary and SphericalBoundary (and adding corresponding deprecations)
  • One PR for fixing the gradient resampler
  • One PR for fixiing the kd_tree resampler
  • One PR for refactoring the boundary code and adding the new functionality
  • One PR for fixing the data reducing
  • etc...

Also, a lot of the new code does not seem to be tested. This makes it very difficult to understand the use cases. Tests are not only for testing, but also for illustrating the use cases of the code, and that helps me as reviewer understand the code and the intention.

I cannot accept this PR for now, as it is not possible for me to understand all cases this covers and all the implication within a reasonable amount of time.
I will however gladly review smaller, incremental PRs to introduce all the work you have done here.

Comment on lines +406 to +419
expected_lons = [
-90.67900085, 79.11000061, # 81.26400757,
81.26400757, 29.67200089, # 10.26000023,
10.26000023, -5.10700035, # -21.52500153,
-21.52500153, -21.56500053, # -90.67900085,
]
expected_lats = [
85.23900604, 80.84000397, # 67.07600403,
67.07600403, 54.14700317, # 30.54700089,
30.54700089, 34.0850029, # 35.58000183,
35.58000183, 62.25600433, # 85.23900604,
]
np.testing.assert_allclose(lons, expected_lons)
np.testing.assert_allclose(lats, expected_lats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are one third of the values commented out?

Comment on lines +429 to +440
expected_lons = [
-45., -90., # -135.,
-135., -180., # 135.,
135., 90., # 45.,
45., 0., # -45.
]
expected_lats = [
80., 80., # 80.,
80., 80., # 80.,
80., 80., # 80.,
80., 80., # 80.
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the comment values for?

Comment on lines -151 to -154
# Swath defs raise AttributeError, and False is returned
get_polygon.side_effect = AttributeError
self.resampler._get_dst_poly('idx2', 0, 10, 0, 10)
assert self.resampler.dst_polys['idx2'] is False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this part of the test been removed? do we loose functionality with this PR?

"""Test polygon creation."""
from pyresample.gradient import get_polygon
from pyresample.gradient import _get_polygon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a private (_) function if it needs to be imported?

Comment on lines +29 to +30
TEST_FILES_PATH = os.path.join(_root_path, "test", 'test_files')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also defined in at least one other file, it should be factorised.


class BaseBoundary:
"""Base class for boundary objects."""
__slots__ = ["_sides_x", "_sides_y"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you motivate the use of __slots__ here? I don't see any real benefit in this class.

Comment on lines +48 to +50
def _compute_boundary_sides(self, area, vertices_per_side):
"""Compute boundary sides."""
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the base class should be abstract, right?

return not polygon.exterior.is_ccw


class PlanarBoundary(BaseBoundary):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not sufficiently tested.

Comment on lines +52 to +55
@classmethod
def _compute_boundary_sides(cls, area, vertices_per_side):
sides_x, sides_y = area._get_projection_sides(vertices_per_side=vertices_per_side)
return sides_x, sides_y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a static method.

Comment on lines +57 to +63
def __init__(self, area, vertices_per_side=None):
super().__init__(area=area, vertices_per_side=vertices_per_side)

self.sides_x = self._sides_x
self.sides_y = self._sides_y
self.crs = self._area.crs
self.cartopy_crs = self._area.to_cartopy_crs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the top of the class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompatibility Causes backwards incompatibility or introduces a deprecation enhancement
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants