-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Empty dataset not empty #300
Comments
Here is a visualization of the spatial and temporal attributes of the granule that is not being processed correctly for the above referenced Harmony request: The requested time window is The times in the granule are just after the request time window, so that is why no matches to the spatiotemporal conditions are found during the subsetter's processing. |
@danielfromearth Would this still be an issue if l2ss-py were to not return any files in these cases of no data? That is my understanding of the implications of the decision in https://bugs.earthdata.nasa.gov/browse/TRT-36 to return no files in cases of no data. We'll be looking at changing the implementation of l2ss-py as part of #308 so not urgent. |
I think the request with empty file request will be handled on harmony level, and batchee will be called with the catalog that includes only items with "good" files (empty file will be excluded). If so, batchee/stitchee operation should not be affected. But we can certainly test it and change stitchee to follow TRT-36 regulations as needed. |
Summary
An inappropriate fill value is set when creating an empty dataset copy. This results in failures of subsequent processing, because instead of the dataset being truly empty, there is a "valid" value in a data variable, instead of a true fill value.
Description of the problem
When there are no data points that match the requested spatiotemporal conditions, l2ss-py creates an empty dataset copy here. @ank1m and I discovered an edge case where a valid value is being placed in the new, copied variable, instead of the expected null or fill value. This occurred for the following "ground_pixel_quality_flag" variable, which notably has an integer type (
int32
) and has no declared'_FillValue'
attribute:Here is a screenshot showing the variable, in a TEMPO collection:

Since this variable, "support_data/ground_pixel_quality_flag", doesn't have a
'_FillValue'
, l2ss-py tries to create an empty array usingnp.nan
instead. But, because this variable is of type'int32'
, it can't usenp.nan
!Instead, the code raises a
RuntimeWarning: invalid value encountered in cast multiarray.copyto(a, fill_value, casting='unsafe')
and then defaults back to using a
0
instead ofnp.nan
.However,
0
is a valid value for this variable (see thevalid_min
andvalid_max
attributes in the above screenshot), so subsequent operations see a valid array, rather than an empty, or all-fill-value, array.Impact
This causes a failure during the below service chain call, after the "Stitchee" service tries to determine whether the files coming from l2ss-py are empty. Stitchee considers the file as "not empty" here because the variable's single value is not a fill value or null.
Steps to reproduce
The following request currently fails:
https://harmony.uat.earthdata.nasa.gov/C1262899916-LARC_CLOUD/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&granuleId=G1269044803-LARC_CLOUD%2CG1269044708-LARC_CLOUD%2CG1269044681-LARC_CLOUD%2CG1269044688-LARC_CLOUD%2CG1269044514-LARC_CLOUD%2CG1269044741-LARC_CLOUD%2CG1269044710-LARC_CLOUD%2CG1269044439-LARC_CLOUD%2CG1269044715-LARC_CLOUD%2CG1269044815-LARC_CLOUD%2CG1269044726-LARC_CLOUD%2CG1269044787-LARC_CLOUD%2CG1269044827-LARC_CLOUD%2CG1269044658-LARC_CLOUD%2CG1269044679-LARC_CLOUD%2CG1269044727-LARC_CLOUD&subset=lat(32.56485%3A42.82943)&subset=lon(-135.7248%3A-52.76692)&subset=time(%222024-08-02T00%3A00%3A00.000Z%22%3A%222024-08-02T10%3A39%3A37.000Z%22)&concatenate=true&skipPreview=true
Desired change
An appropriate fill or null value for each variable's dtype is used when creating an "empty" dataset.
I think that means the dataset copy in l2ss should either:
netCDF4.default_fillvals
), orThe text was updated successfully, but these errors were encountered: