Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEDI-based ensemble recentering and analysis calculation #3312

Open
wants to merge 57 commits into
base: develop
Choose a base branch
from

Conversation

DavidNew-NOAA
Copy link
Contributor

@DavidNew-NOAA DavidNew-NOAA commented Feb 10, 2025

Description

COORDINATED MERGE

This PR implements ensemble recentering and analysis calculation in the Global Workflow, using JEDI-based applications to replace certain GSI utilities when JEDI is turned on in the workflow. If using GSI, then the workflow will remain unchanged. This PR also (finally) implements native-grid DA increments into the worflow.

The gdas_analcalc and enkfgdas_ecen jobs will be replaced by gdas_analcalc_fv3jedi and enkfgdas_ecen_fv3jedi jobs respectively. The enkfgdas_echgres job is eliminated, since changing of resolution of the deterministic backgrounds is done internally in the JEDI-based recentering application.

The design for this PR is based on discussions between the DA team and GW team a few months ago. Explanation of the flow of data through the workflow:

The gdas_analcalc_fv3jedi job dependencies do not change. The native-grid backgrounds andincrements are staged, and then the GDASApp JEDI fv3jedi_add_increments application is run to add them and interpolate to the Gaussian grid. The Gaussian-grid backgrounds are also staged, and then a simple Python function inserts to these analysis variables into the histories, which become the Gaussian analyses. This is done this way to guarantee that the resulting Gaussian analyses are in the exact format required by UPP.

The enkfgdas_ecen_fv3jedi no longer depends on the analysis calc job, since the ensemble-resolution variational analysis is computed/interpolated internally in the JEDI-based recentering application. All other job dependencies remain the same. We no longer need to compute the ensemble mean analysis in this job, since it can be outputted the the JEDI local ensemble DA application in the enkfgdas_atmensanlsol job and just staged for recentering. The variational increment and deterministic backgrounds are also staged to compute the ensemble-resolution variational analysis. The output of this job is no longer the recentered ensemble increments, but rather the "correction increment", which when added to ensemble increments becomes the recentered increments. The prefix for the "correction increment" is catminc.

The enkfgdas_fcst job now stages both the ensemble increments and the correction increment. They are added together with ncbo in forecast_postdet.sh to generate the recentered increment.

All forecast increments, both deterministic and ensemble, are now on the native cubed-sphere grid

Resolves #3248

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? YES
  • Does this change require a documentation update? IDK
  • Does this change require an update to any of the following submodules? YES
    • EMC verif-global
    • GDAS #1488
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Clone and build on Hera
  • Run C96C48_ufs_hybatmDA

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidNew-NOAA DavidNew-NOAA added the JEDI Feature development to support JEDI-based DA label Feb 12, 2025
@RussTreadon-NOAA
Copy link
Contributor

Interesting. I updated to d9e0275.

Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$ git log --oneline | head -1
d9e0275a Shell norm fix
Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$ shellcheck jobs/JGDAS_ENKF_ECEN_FV3JEDI
Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$

No shellcheck warnings.

I am using a copy of shellcheck which originates from @WalterKolczynski-NOAA

Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$ which shellcheck
/home/Rahul.Mahajan/bin/shellcheck
Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$
Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$ ls -l /home/Rahul.Mahajan/bin/shellcheck
lrwxrwxrwx 1 Rahul.Mahajan da 64 Feb 26  2023 /home/Rahul.Mahajan/bin/shellcheck -> /scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/bin/shellcheck

Is this version of shellcheck too old?

I see that g-w has a .shellcheckrc. I reran shellcheck ingoring the .shellcheckrc file

Hera(hfe05):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/calcanl$ shellcheck --norc jobs/JGDAS_ENKF_ECEN_FV3JEDI

In jobs/JGDAS_ENKF_ECEN_FV3JEDI line 3:
source "${HOMEgfs}/ush/preamble.sh"
       ^--------------------------^ SC1091 (info): Not following: ./ush/preamble.sh was not specified as input (see shellcheck -x).
        ^--------^ SC2154 (warning): HOMEgfs is referenced but not assigned.


In jobs/JGDAS_ENKF_ECEN_FV3JEDI line 4:
source "${HOMEgfs}/ush/jjob_header.sh" -e "ecen_fv3jedi" -c "base ecen_fv3jedi"
       ^-----------------------------^ SC1091 (info): Not following: ./ush/jjob_header.sh was not specified as input (see shellcheck -x).


In jobs/JGDAS_ENKF_ECEN_FV3JEDI line 12:
GDATE=$(date --utc +%Y%m%d%H -d "${PDY} ${cyc} - ${assim_freq} hours")
                                        ^----^ SC2154 (warning): cyc is referenced but not assigned (did you mean 'gcyc'?).
                                                 ^-----------^ SC2154 (warning): assim_freq is referenced but not assigned.


In jobs/JGDAS_ENKF_ECEN_FV3JEDI line 47:
if [[ -e "${pgmout}" ]] ; then
          ^-------^ SC2154 (warning): pgmout is referenced but not assigned.

For more information:
  https://www.shellcheck.net/wiki/SC2154 -- HOMEgfs is referenced but not ass...
  https://www.shellcheck.net/wiki/SC1091 -- Not following: ./ush/jjob_header....

Now I see shellcheck warnings ... but these warnings are not the ones we see from github actions.

Any ideas @WalterKolczynski-NOAA, @aerorahul , @KateFriedman-NOAA as to (a) what's going on and (b) how to get the github shellcheck action to pass?

@DavidNew-NOAA
Copy link
Contributor Author

@aerorahul @WalterKolczynski-NOAA This PR is ready for review. It has a pending companion GDAS PR, but I'd like some feedback on this PR before I merge the GDAS PR and break things.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 g-w CI

Install DavidNew-NOAA:feature/calcanl with appropriate GDASApp and jcb-gdas hashes on Cactus. Set up and run C96C48_ufs_hybatmDA.

All jobs ran up to the 20240224 00Z cycle. Jobs enkfgdas_fcst_mem001 and enkfgdas_fcst_mem002 failed in this cycle with the error message

/lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/pr3312/ush/forecast_postdet.sh: line 223: ncbo: command not found

jobs/rocoto/fcst.sh loads modules from a different source when running on WCOSS2

###############################################################                                                                                  
# Source FV3GFS workflow modules                                                                                                                 
# TODO clean this up once ncdiag/1.1.2 is installed on WCOSS2                                                                                    
source "${HOMEgfs}/ush/detect_machine.sh"
if [[ "${MACHINE_ID}" == "wcoss2" ]]; then
   . ${HOMEgfs}/ush/load_ufswm_modules.sh
else
   . ${HOMEgfs}/ush/load_fv3gfs_modules.sh
fi

The fv3gfs modules the nco module. The ufswm modules do not load nco

@WalterKolczynski-NOAA , @aerorahul , @DavidHuber-NOAA : any suggestions for how best to resolve this issue?

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 Test

As a test make the following modification to $HOMEgfs/jobs/rocoto/fcst.sh

@@ -8,6 +8,10 @@ source "${HOMEgfs}/ush/preamble.sh"
 source "${HOMEgfs}/ush/detect_machine.sh"
 if [[ "${MACHINE_ID}" == "wcoss2" ]]; then
    . ${HOMEgfs}/ush/load_ufswm_modules.sh
+   module load udunits/2.2.28
+   module load gsl/2.7
+   module load nco/5.0.6
+   module list
 else
    . ${HOMEgfs}/ush/load_fv3gfs_modules.sh
 fi

It is necessary to load udunits and gsl before nco. With these changes the enkfgdas_fcst_mem jobs run to completion. This is not a solution. It is only a test to demonstrate that defining the path to ncbo allows the enkfgdas forecast jobs to run to completion.

The C96C48_ufs_hybatmDA completed all jobs with only the following jobs remaining

202402240000      enkfgdas_earc_vrfy                           -                   -                   -         -             -
202402240000        enkfgdas_cleanup                           -                   -                   -         -             -
202402240600      enkfgdas_earc_vrfy                           -                   -                   -         -             -
202402240600        enkfgdas_cleanup                           -                   -                   -         -             -

A rocotocheck of enkfgdas_earc_vrfy shows that this job can not be queued because enkfgdas_echgres of the given cycle is not SUCCEEDED.

The changes in this PR remove echgres from JEDI based DA. Job echgres is no longer in the C96C48_ufs_hybatmDA xml.

$HOMEgfs/workflow/rocoto/gfs_tasks.py has the following dependencies for earc_vrfy

    def earc_vrfy(self):

        deps = []
        if 'enkfgdas' in self.run:
            dep_dict = {'type': 'metatask', 'name': f'{self.run}_epmn'}
        else:
            dep_dict = {'type': 'task', 'name': f'{self.run}_esfc'}
        deps.append(rocoto.add_dependency(dep_dict))
        dep_dict = {'type': 'task', 'name': f'{self.run}_echgres'}

We need add logic to exclude the echgres dependency when running JEDI atmospheric DA.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 Test 2

Updated workflow/rocoto/gfs_tasks.py as follows

@@ -3062,7 +3062,8 @@ class GFSTasks(Tasks):
         else:
             dep_dict = {'type': 'task', 'name': f'{self.run}_esfc'}
         deps.append(rocoto.add_dependency(dep_dict))
-        dep_dict = {'type': 'task', 'name': f'{self.run}_echgres'}
+        if not self.options['do_jediatmvar']:
+            dep_dict = {'type': 'task', 'name': f'{self.run}_echgres'}
         deps.append(rocoto.add_dependency(dep_dict))
         dependencies = rocoto.create_dependency(dep_condition='and', dep=deps)

and regenerated C96C48_ufs_hybatmDA_pr3312.xml.

While this allowed enkfgdas_earc_vrfy to be submitted, the job failed due to a required file not being found

^[[38;5;196m2025-02-15 13:40:51,204 - ERROR    - file_utils  : Source file '/lfs/h2/emc/ptmp/russ.treadon/COMROOT/C96C48_ufs_hybatmDA_pr3312/enkfgdas.20240224/00/ensstat/analysis/atmos/enkfgdas.t00z.atminc.ensmean.nc' does not exist and is required, ABORT!^[[0m
NoneType: None

Directory enkfgdas.20240224/00/ensstat/analysis/atmos now contains different files

 /lfs/h2/emc/ptmp/russ.treadon/COMROOT/C96C48_ufs_hybatmDA_pr3312/enkfgdas.20240224/00/ensstat/analysis/atmos:
  total used in directory 1494816 available 827.9 TiB
  drwxr-sr-x 2 russ.treadon da       4096 Feb 14 21:41 .
  drwxr-sr-x 3 russ.treadon da       4096 Feb 14 21:27 ..
  -rw-r--r-- 1 russ.treadon da       2211 Feb 14 21:01 enkfgdas.t00z.atmensanlfv3inc.yaml
  -rw-r--r-- 1 russ.treadon da      79637 Feb 14 21:01 enkfgdas.t00z.atmensanlobs.yaml
  -rw-r--r-- 1 russ.treadon da      79718 Feb 14 21:01 enkfgdas.t00z.atmensanlsol.yaml
  -rw-r--r-- 1 russ.treadon da 1275566080 Feb 14 21:30 enkfgdas.t00z.atmensstat
  -rw-r--r-- 1 russ.treadon da  128219453 Feb 14 21:18 enkfgdas.t00z.cubed_sphere_grid_atmanl.ensmean.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile1.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile2.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile3.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile4.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile5.nc
  -rw-r--r-- 1 russ.treadon da   21116749 Feb 14 21:41 enkfgdas.t00z.cubed_sphere_grid_catminc.tile6.nc

Templates in $HOMEgfs/parm/archive need to be examined and modified to support both GSI- and JEDI-based atmospheric DA.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS Test 3

Locally modify parm/archive/enkf.yaml.j2 and parm/archive/gfs_arcdir.yaml.j2 as follows

parm/archive/enkf.yaml.j2

@@ -61,13 +61,26 @@ enkf:
         {% set da_files = ["atmensanlobs.yaml",
                            "atmensanlsol.yaml",
                            "atmensanlfv3inc.yaml",
-                           "atminc.ensmean.nc",
-                           "atmensstat"] %}
+                           "atmensstat",
+                           "cubed_sphere_grid_atmanl.ensmean.nc",
+                           "cubed_sphere_grid_catminc.tile1.nc",
+                           "cubed_sphere_grid_catminc.tile2.nc",
+                           "cubed_sphere_grid_catminc.tile3.nc",
+                           "cubed_sphere_grid_catminc.tile4.nc",
+                           "cubed_sphere_grid_catminc.tile5.nc",
+                           "cubed_sphere_grid_catminc.tile6.nc"] %}
         {% else %}
         {% set da_files = ["atmensanlletkf.yaml",
                            "atmensanlfv3inc.yaml",
                            "atminc.ensmean.nc",
-                           "atmensstat"] %}
+                           "atmensstat",
+                           "cubed_sphere_grid_atmanl.ensmean.nc",
+                           "cubed_sphere_grid_catminc.tile1.nc",
+                           "cubed_sphere_grid_catminc.tile2.nc",
+                           "cubed_sphere_grid_catminc.tile3.nc",
+                           "cubed_sphere_grid_catminc.tile4.nc",
+                           "cubed_sphere_grid_catminc.tile5.nc",
+                           "cubed_sphere_grid_catminc.tile6.nc"] %}
         {% endif %}
         {% endif %}
         {% for file in da_files %}

parm/archive/gfs_arcdir.yaml.j2

@@ -127,8 +127,6 @@
     {% if DO_JEDIATMENS == True %}
         {% do enkf_files.append([COMIN_ATMOS_ANALYSIS_ENSSTAT ~ "/" ~ head ~ "atmensstat",
                                  ARCDIR ~ "/atmensstat." ~ RUN ~ "." ~ cycle_YMDH ]) %}
-        {% do enkf_files.append([COMIN_ATMOS_ANALYSIS_ENSSTAT ~ "/" ~ head ~ "atminc.ensmean.nc",
-                                 ARCDIR ~ "/atmensstat." ~ RUN ~ "." ~ cycle_YMDH ~ ".ensmean.nc"]) %}
     {% else %}
         {% do enkf_files.append([COMIN_ATMOS_ANALYSIS_ENSSTAT ~ "/" ~ head ~ "enkfstat",
                                  ARCDIR ~ "/enkfstat." ~ RUN ~ "." ~ cycle_YMDH ]) %}

With these changes in place the enkfgdas_earc_vrfy jobs ran to completion.

With these and other local modifications to the Cactus working copy DavidNew-NOAA:feature/calcanl all jobs for g-w CI case C96C48_ufs_hybatmDA successfully ran to completion.

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_ufs_hybatmDA_pr3312
202402231800        Done    Feb 14 2025 20:19:54    Feb 14 2025 20:40:46
202402240000        Done    Feb 14 2025 20:19:54    Feb 17 2025 20:56:07
202402240600        Done    Feb 14 2025 20:19:54    Feb 17 2025 20:56:07

The modified copy of DavidNew-NOAA:feature/calcanl is on Cactus in /lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/pr3312. Changes were made to the following files

        modified:   jobs/rocoto/fcst.sh
        modified:   parm/archive/enkf.yaml.j2
        modified:   parm/archive/gfs_arcdir.yaml.j2
        modified:   sorc/gdas.cd (untracked content)
        modified:   sorc/ufs_model.fd (untracked content)
        modified:   workflow/rocoto/gfs_tasks.py

The change to sorc/gdas.cd is simply to point at GDASApp branch feature/calcanl.

I did not make any changes in sorc/ufs_model.fd The local changes in this file are

        compile_1_time.log
        compile_2_time.log
        compile_3_time.log
        tests/modules.ufs_model.lua
        tests/ufs_common.lua

My guess is that these changes are related to the gefs, gfs, and sfs builds of the UFS weather model.

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g-w CI for C96C48_hybatmDA (GSI-based atmospheric DA) and C96C48_ufs_hybatmDA (JEDI-based atmospheric DA) run on Cactus. All jobs in C96C48_hybatmDA successfully run to completion. The enkfgdas forecast and archive jobs failed. Modifications to g-w files allowed these jobs to successfully run to completion.

@DavidNew-NOAA
Copy link
Contributor Author

Thanks @RussTreadon-NOAA for the thorough testing.

The archive job issue seems like an easy fix.

I'm not so sure about the ncbo issue in the enkfgdas_fcst. In a perfect world, I would make the enkfgdas_ecen_fv3jedi job the first stage of a two-stage set of jobs, with the enkfgdas_fcst job being the second stage. Then we could just add the ensemble increment (atminc) and "correction increment" (catminc) internally inside the JEDI-based recentering application. This would be much more efficient and cleaner overall. The problem is that the forecast job is such a beast, not using the the pygfs Task class infrastructure like other jobs, that I was reluctant to try such a thing. I might need to sit down and discuss this with @aerorahul and @WalterKolczynski-NOAA to get their take.

@RussTreadon-NOAA
Copy link
Contributor

@DavidNew-NOAA : fcst.sh contains the following TODO comment

###############################################################
# Source FV3GFS workflow modules
# TODO clean this up once ncdiag/1.1.2 is installed on WCOSS2
source "${HOMEgfs}/ush/detect_machine.sh"
if [[ "${MACHINE_ID}" == "wcoss2" ]]; then
   . ${HOMEgfs}/ush/load_ufswm_modules.sh
else
   . ${HOMEgfs}/ush/load_fv3gfs_modules.sh
fi

Two questions for others

  1. Why does the UFS weather model need ncdiag/1.1.2?
  2. Is ncdiag/1.1.2 is now available on WCOSS2? If not, why not?

AntonMFernando-NOAA and others added 2 commits February 20, 2025 13:42
The `gfs_arch_tars` job currently does not depend on `gempak` jobs,
even though it archives data produced by them. This PR will introduce
that dependency. Additionally, there are several missing dependencies
for cleanup when the arch_tar job is not executed. Nearly all of the
job's dependencies need to be replicated for cleanup in case arch_tar
doesn't run. This PR will address this problem as well.

Resolves NOAA-EMC#3294
Adds sfs as a valid option for NET.

To start, the GEFS system is generally just copied wholesale for SFS.
This includes the extract_vars job.

Other than base and resources, config files link to the GEFS versions,
just as GEFS config files point to the GFS versions except where they
have needed to be changed.

The temporary SFS_POST option has been removed.

The existing SFS test is copied and slightly modified for a PR-level CI
test.

Resolves NOAA-EMC#2271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JEDI Feature development to support JEDI-based DA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create JEDI-based ensemble recentering and analysis calculation job
4 participants