BUFR2NETCDF bug for ADPUPA Prepbufr ---- the total number of data from output is not consistent for test with and without MPI #46

emilyhcliu · 2025-01-16T22:15:08Z

CASE: BUFR2NETCDF conversion for ADPUPA prepbufr test with MPI numbers: 2, 4, 8, and without MPI.

input BUFR file: the prepbufr file (contains all subset)
output  NetCDF:  ADPUPA sbuset from prepbufr in NetCDF format

Expectation:  the output files from runs without and with various MPI configurations (2, 4, 8) should have the same size, and number of data,

Sympton:

The number of obs in output:
MPI = 0  ----> 270600
MPI = 1  ----> 270600
MPI = 2 ----> 261786
MPI = 4 ----> 250079
MPI = 8 ---->  239868

We are losing more data in the output as we increase the number of MPIs

Test Setup on HERA

ObsForge build: /scratch1/NCEPDEV/da/Emily.Liu/EMC-obsForge/obsForge_adpupa
        - IODA: feature/bufr_in_parallel
        - SPOC: feature/adpupa_prepbufr_NEW


Run directory: /scratch1/NCEPDEV/da/Emily.Liu/EMC-obsForge/run_adpupa_prepbufr
./run_encodeBufr bufr2netcdf 0
./run_encodeBufr bufr2netcdf 1
./run_encoderBuir bufr2netcdf 2

The output directory is ./testoutput/2021080100

The text was updated successfully, but these errors were encountered:

PraveenKumar-NOAA · 2025-01-22T20:11:53Z

It looks like a group_by field issue as @nicholasesposito also encountered similar one in his acft_profiles case: #40.

Looking into details about the group_by query string for the adpupa prepbufr, which is ADPUPA/PRSLEVEL/CAT

ADPUPA
Dimensioning Sub-paths:
3d */PRSLEVEL
2d int ADPUPA/PRSLEVEL/CAT

When I used the followings for the query string:
ADPUPA/PRSLEVEL{1}/CAT
ADPUPA/PRSLEVEL{2}/CAT
ADPUPA/PRSLEVEL{3}/CAT

I got following results across different nodes, but number of observations/location were much lowered compared to the original number, 270600.

MPI = 0 239770 gdas.t00z.adpupa_prepbufr.tm00_0.nc ----> 1353
MPI = 2 239770 gdas.t00z.adpupa_prepbufr.tm00_2.nc ----> 1353
MPI = 4 239770 gdas.t00z.adpupa_prepbufr.tm00_4.nc ----> 1353
MPI = 8 239770 gdas.t00z.adpupa_prepbufr.tm00_8.nc ----> 1353

emilyhcliu · 2025-01-22T20:25:54Z

It looks like a group_by field issue as @nicholasesposito also encountered similar one in his acft_profiles case: #40.

Looking into details about the group_by query string for the adpupa prepbufr, which is ADPUPA/PRSLEVEL/CAT

ADPUPA Dimensioning Sub-paths: 3d */PRSLEVEL 2d int ADPUPA/PRSLEVEL/CAT

When I used the followings for the query string: ADPUPA/PRSLEVEL{1}/CAT ADPUPA/PRSLEVEL{2}/CAT ADPUPA/PRSLEVEL{3}/CAT

I got following results across different nodes, but number of observations/location were much lowered compared to the original number, 270600.

MPI = 0 239770 gdas.t00z.adpupa_prepbufr.tm00_0.nc ----> 1353 MPI = 2 239770 gdas.t00z.adpupa_prepbufr.tm00_2.nc ----> 1353 MPI = 4 239770 gdas.t00z.adpupa_prepbufr.tm00_4.nc ----> 1353 MPI = 8 239770 gdas.t00z.adpupa_prepbufr.tm00_8.nc ----> 1353

The number 1353 is likely the number of stations.

rmclaren · 2025-01-22T21:05:55Z

I'm aware of the problem, I just haven't had a chance to look at it... I will need to do another round of bug fixing at some point..

PraveenKumar-NOAA · 2025-01-23T20:54:32Z

@emilyhcliu @rmclaren FYI - SCRIPT2NETCDF also has similar issue, i.e. we are losing more data in the output as we increase the number of MPIs.

I also confirm that there are no such issues with the BUFR_BACKEND and SCRIPT_BACKEND.

rmclaren · 2025-02-11T16:04:54Z

So I've figured out what is going on. Basically this happens because the data is in the form of "jagged" arrays. This means that the group_by vector size will vary from subset to subset. The way that group by currently works is that it will normalize this data so that every subset that is read is inflated to the same dimensions and then group by is applied. For example:

Data Read Dimensions (per subset)

n, 12
n, 14
n, 13

Inflated (takes max of extra dimensions)

n, 14
n, 14
n, 14

The extra values are filled with missing values.

Then Group by is applied final dimensions.

n x 14
n x 14
n x 14

For the case when there are multiple MPI processes the MAX value (ex: 14) is not guaranteed to be the same so the first MPI process might have a max of ex: 10, and the second 14. The end result is that the final result might not be the same size when computing on multiple nodes.

These extra rows are just filler and don't tend to matter so you can ignore them. I think a better (more correct) behavior might be to do the grouping before inflating the dimensions. There probably is not much point to adding these extra rows to the data (just makes the dataset larger).

PraveenKumar-NOAA · 2025-02-11T17:12:50Z

@rmclaren thank you for the clarification! Please let me know how to do the grouping before inflating the dimensions.

rmclaren · 2025-02-11T19:10:09Z

@PraveenKumar-NOAA Nothing you can do I'm afraid... I have to modify the result set class to do that.

emilyhcliu assigned PraveenKumar-NOAA and rmclaren Jan 16, 2025

emilyhcliu mentioned this issue Jan 22, 2025

Config and mapping files for ADPUPA prepBUFR NOAA-EMC/spoc#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUFR2NETCDF bug for ADPUPA Prepbufr ---- the total number of data from output is not consistent for test with and without MPI #46

BUFR2NETCDF bug for ADPUPA Prepbufr ---- the total number of data from output is not consistent for test with and without MPI #46

emilyhcliu commented Jan 16, 2025 •

edited

Loading

PraveenKumar-NOAA commented Jan 22, 2025 •

edited

Loading

emilyhcliu commented Jan 22, 2025

rmclaren commented Jan 22, 2025

PraveenKumar-NOAA commented Jan 23, 2025

rmclaren commented Feb 11, 2025 •

edited

Loading

PraveenKumar-NOAA commented Feb 11, 2025

rmclaren commented Feb 11, 2025

BUFR2NETCDF bug for ADPUPA Prepbufr ---- the total number of data from output is not consistent for test with and without MPI #46

BUFR2NETCDF bug for ADPUPA Prepbufr ---- the total number of data from output is not consistent for test with and without MPI #46

Comments

emilyhcliu commented Jan 16, 2025 • edited Loading

PraveenKumar-NOAA commented Jan 22, 2025 • edited Loading

emilyhcliu commented Jan 22, 2025

rmclaren commented Jan 22, 2025

PraveenKumar-NOAA commented Jan 23, 2025

rmclaren commented Feb 11, 2025 • edited Loading

Data Read Dimensions (per subset)

Inflated (takes max of extra dimensions)

Then Group by is applied final dimensions.

PraveenKumar-NOAA commented Feb 11, 2025

rmclaren commented Feb 11, 2025

emilyhcliu commented Jan 16, 2025 •

edited

Loading

PraveenKumar-NOAA commented Jan 22, 2025 •

edited

Loading

rmclaren commented Feb 11, 2025 •

edited

Loading