Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NCO Bug] Display clear warning when missing snogrb file #2500

Open
KateFriedman-NOAA opened this issue Apr 17, 2024 · 14 comments · May be fixed by #3317
Open

[NCO Bug] Display clear warning when missing snogrb file #2500

KateFriedman-NOAA opened this issue Apr 17, 2024 · 14 comments · May be fixed by #3317
Assignees
Labels
nco-bug Something isn't working in Ops.
Milestone

Comments

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 17, 2024

What is wrong?

Bugzilla 1373

From bugzilla text from SPA Carlos Diaz:

In gfs_atmos_analysis, if the current gfs.t$CCz.snogrb_t1534.3072.1536 (from the 
gfs_atmos_emcsfc_sfc_prep job) is missing, the task proceeds normally with no clear warning.

Below are recommendations from George Gayno that there should be an alert if the file is missing:

"If the process that creates the snogrb file breaks somehow, the previous cycle's snogrb file is copied 
to the current directory. So the GFS will continue to run, even if snogrb is old. (GFS checks the date in
snogrb and if it is old, it does not apply the snow). 
Snow data is not 'critical'. The GFS should continue to run. But there should be an alert if the IMS or 
Air Force data are old so the problem can be investigated."

Please work to produce a clear warning if the file is missing as well as an email alert if needed.

What should have happened?

An alert should have been printed when snogrb used is old.

Steps to reproduce

Simulate no snogrb for current cycle and make system use older one.

Bugzilla issue

1373

Additional information

Reply from @RussTreadon-NOAA in bugzilla:

gfs_atmos_analysis and gdas_atmos_analysis execute j-job JGLOBAL_ATMOS_ANALYSIS.  This job,
in turn, executes exglobal_atmos_analysis.sh.   These jobs now reside in the NOAA-EMC/global-workflow
repository.  Therefore, reassign this bugzilla to EIB.  Add George Gayno to cc list since he maintains
gfs_atmos_emcsfc_sfc_prep.  The MDAB-DAQC team will support EMC staff in resolution of this bugzilla.

Do you have a proposed solution?

No response

@KateFriedman-NOAA KateFriedman-NOAA added nco-bug Something isn't working in Ops. triage Issues that are triage labels Apr 17, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Apr 17, 2024
@KateFriedman-NOAA KateFriedman-NOAA self-assigned this Apr 24, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA added this to the GFS v17 milestone Jan 27, 2025
@KateFriedman-NOAA
Copy link
Member Author

@GeorgeGayno-NOAA The global-workflow scripts that handle the snogrb file currently only do the following if it doesn't exist:

export FNSNOA=${FNSNOA:-${COMIN_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
[[ ! -f ${FNSNOA} ]] && export FNSNOA="${COMIN_OBS}/${OPREFIX}snogrb_t1534.3072.1536"

If ${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}} doesn't exist, should the job look for the t1534 one and then go for the prior cycle one? Or should the backup behavior be different? Let me know, thanks!

@GeorgeGayno-NOAA
Copy link
Contributor

@KateFriedman-NOAA - I am trying to understand what that logic is doing. For the GDAS cycle there are two snow files created - t1534.3072.1536 and t574.1152.576. In this case, if the JCAP_CASE=1534/LONB_CASE=3072/LATB_CASE=1536, then the backup file is the same as the missing file?

Both snow resolutions are created at the same time. So if one is old, the other should also be old.

I might want to see how you run this script in the workflow.

@KateFriedman-NOAA
Copy link
Member Author

In this case, if the JCAP_CASE=1534/LONB_CASE=3072/LATB_CASE=1536, then the backup file is the same as the missing file?

@GeorgeGayno-NOAA Yeah I was wondering about that as well. I don't see when we wouldn't use the higher res version anyway. It seems like it should look for that higher res file and then look back a cycle if not found. I'm also now wondering if this will work with C1152.

I am running the global-workflow develop CI tests on Hera to produce examples/logs/output for another task so I will share that output once created so we can examine how the snogrb file is used currently. Stay tuned...

@KateFriedman-NOAA
Copy link
Member Author

@GeorgeGayno-NOAA Please see the following logs on Hera:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/RUNTESTS_test/COMROOT/C96C48_hybatmDA/logs/2021122100/enkfgdas_esfc.log
/scratch1/NCEPDEV/stmp4/Kate.Friedman/RUNTESTS_test/COMROOT/C96C48_hybatmDA/logs/2021122100/gdas_sfcanl.log
/scratch1/NCEPDEV/stmp4/Kate.Friedman/RUNTESTS_test/COMROOT/C96C48_hybatmDA/logs/2021122100/gfs_sfcanl.log

@GeorgeGayno-NOAA
Copy link
Contributor

OK. I think I understand how the script works. If I look through the "./obs" directories, I only see one snow file - a "t1534.3072.1536" version. Since you run the workflow at many different resolutions, the script looks for a snow file with a comparable resolution. If it does not find one, it defaults to the t1534 version.

For example, in the gdas_sfcanl.log file, it looks for a "gdas.t00z.snogrb_t190.384.192" file. It does not find it because we don't create that resolution. So, the t1534 version is used.

So, I would not consider this a problem. I would not output a warning message because the SPA should not be investigating why there is no t190 file. Maybe some comments should be added to the script to assist him.

Now if the t1534 files are missing, then there is a problem.

One other thing, in the enkfgdas_esfc.log file, it looks for a "gdas.t00z.snogrb_t-2.0.0" file. What does "t-2.0.0" mean?

@KateFriedman-NOAA
Copy link
Member Author

KateFriedman-NOAA commented Feb 10, 2025

One other thing, in the enkfgdas_esfc.log file, it looks for a "gdas.t00z.snogrb_t-2.0.0" file. What does "t-2.0.0" mean?

I looked through the log and job scripts and see that the "t-2.0.0" comes from how the FNSNOA filename is built:

res=${CASE:2:}
JCAP_CASE=$((res*2-2))
LATB_CASE=$((res*2))
LONB_CASE=$((res*4))
export FNSNOA=${FNSNOA:-${COM_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}

For the enkf_sfc job the JCAP_CASE is set as -2, LONB_CASE is 0, and LATB_CASE is 0.
From the log:

+ exgdas_enkf_sfc.sh[81]: res=
+ exgdas_enkf_sfc.sh[82]: JCAP_CASE=-2
+ exgdas_enkf_sfc.sh[83]: LATB_CASE=0
+ exgdas_enkf_sfc.sh[84]: LONB_CASE=0
+ exgdas_enkf_sfc.sh[87]: export 'FNTSFA=        '
+ exgdas_enkf_sfc.sh[87]: FNTSFA='        '
...
+ exgdas_enkf_sfc.sh[89]: export FNSNOA=/scratch1/NCEPDEV/stmp4/Kate.Friedman/RUNTESTS_test/COMROOT/C96C48_hybatmDA/gdas.20211221/00/obs/gdas.t00z.snogrb_t-2.0.0

While investigating why res became blank, I found that the variable substitution to cut CASE was wrong. It is:

res=${CASE:2:}

...but should be:

res=${CASE:1}

This looks like a bug that was hidden because the job found the backup file for FNSNOA via the T1534 one every time. I will fix this in global-workflow develop.

So, the t1534 version is used. So, I would not consider this a problem. I would not output a warning message because the SPA should not be investigating why there is no t190 file. Maybe some comments should be added to the script to assist him.

Agreed, language should be added and can be included in a PR into develop to resolve the bug above and this bugzilla. How about:

diff --git a/scripts/exgdas_enkf_sfc.sh b/scripts/exgdas_enkf_sfc.sh
index 19443253..774044cb 100755
--- a/scripts/exgdas_enkf_sfc.sh
+++ b/scripts/exgdas_enkf_sfc.sh
@@ -78,7 +78,7 @@ bPDY=${BDATE:0:8}
 bcyc=${BDATE:8:2}
 
 # Get dimension information based on CASE
-res=${CASE:2:}
+res=${CASE:1}
 JCAP_CASE=$((res*2-2))
 LATB_CASE=$((res*2))
 LONB_CASE=$((res*4))
@@ -87,9 +87,17 @@ LONB_CASE=$((res*4))
 export FNTSFA=${FNTSFA:-'                  '}
 export FNACNA=${FNACNA:-${COM_OBS}/${OPREFIX}seaice.5min.blend.grb}
 export FNSNOA=${FNSNOA:-${COM_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
-[[ ! -f $FNSNOA ]] && export FNSNOA="${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
+if [[ ! -f $FNSNOA ]]; then
+  echo "WARNING: ${COM_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}} does not exist."
+  echo "Will use ${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536 instead."
+  export FNSNOA="${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
+fi
 FNSNOG=${FNSNOG:-${COM_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
-[[ ! -f $FNSNOG ]] && FNSNOG="${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
+if [[ ! -f $FNSNOG ]]; then
+  echo "WARNING: ${COM_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}} does not exist."
+  echo "Will use ${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536 instead."
+  export FNSNOG="${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
+fi

Feel free to suggest alterations to the above ^

@GeorgeGayno-NOAA
Copy link
Contributor

@KateFriedman-NOAA - how will the SPAs interpret the word "WARNING"? Will they be compelled to investigate the 'missing' snow file? How about a simple print such as echo "Current snow file is $FNSNOA"

@KateFriedman-NOAA
Copy link
Member Author

@GeorgeGayno-NOAA Good point. I spoke with my fellow g-w CMs and we want to use the word "INFO" instead of "WARNING". With that and your suggestion, my suggestion would be:

export FNSNOA=${FNSNOA:-${COM_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
if [[ ! -f $FNSNOA ]]; then
  export FNSNOA="${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
  echo "INFO: Current snow file is $FNSNOA"
fi
export FNSNOG=${FNSNOG:-${COM_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
if [[ ! -f $FNSNOG ]]; then
  export FNSNOG="${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
  echo "INFO: Previous snow file is $FNSNOG"
fi

Thoughts?

@GeorgeGayno-NOAA
Copy link
Contributor

@GeorgeGayno-NOAA Good point. I spoke with my fellow g-w CMs and we want to use the word "INFO" instead of "WARNING". With that and your suggestion, my suggestion would be:

export FNSNOA=${FNSNOA:-${COM_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
if [[ ! -f $FNSNOA ]]; then
  export FNSNOA="${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
  echo "INFO: Current snow file is $FNSNOA"
fi
export FNSNOG=${FNSNOG:-${COM_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
if [[ ! -f $FNSNOG ]]; then
  export FNSNOG="${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
  echo "INFO: Previous snow file is $FNSNOG"
fi

Thoughts?

Looks good.

@KateFriedman-NOAA
Copy link
Member Author

I modified it a bit more so the INFO message is always present and the log states which snow file will be used regardless of which one. Also added a comment about the file existence check:

# Check if resolution specific FNSNOA exists, if not use t1534 version
[[ ! -f $FNSNOA ]] && export FNSNOA="${COM_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Current snow file is ${FNSNOA}"
export FNSNOG=${FNSNOG:-${COM_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
# Check if resolution specific FNSNOG exists, if not use t1534 version
[[ ! -f $FNSNOG ]] && export FNSNOG="${COM_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Previous snow file is ${FNSNOG}"

Will be making the same update to scripts/exglobal_atmos_sfcanl.sh.

@KateFriedman-NOAA
Copy link
Member Author

KateFriedman-NOAA commented Feb 11, 2025

@GeorgeGayno-NOAA What kind of messaging would you like if the T1534 snogrb file is missing? Here is the snogrb section of scripts/exgdas_enkf_sfc.sh with the new INFO message and a suggested FATAL ERROR message with exit if missing:

export FNSNOA=${FNSNOA:-${COMIN_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
# Check if resolution specific FNSNOA exists, if not use t1534 version
[[ ! -f $FNSNOA ]] && export FNSNOA="${COMIN_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Current snow file is ${FNSNOA}"
if [[ ! -f $FNSNOA ]]; then
  echo "FATAL ERROR: Current snow file ${FNSNOA} is missing. Exiting."
  exit 1
fi
export FNSNOG=${FNSNOG:-${COMIN_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
# Check if resolution specific FNSNOG exists, if not use t1534 version
[[ ! -f $FNSNOG ]] && export FNSNOG="${COMIN_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Previous snow file is ${FNSNOG}"
if [[ ! -f $FNSNOG ]]; then
  echo "FATAL ERROR: Previous snow file ${FNSNOG} is missing. Exiting."
  exit 1
fi

Thoughts?

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Feb 11, 2025
- Add INFO echo to state which snogrb file will be used in log
- Also fix bug with res setting in exgdas_enkf_sfc.sh

Resolves bugzilla 1373

Refs NOAA-EMC#2500
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Feb 11, 2025
- Add "FATAL ERROR" message if either snogrb file is missing.
- Update COMIN/COMOUT variable settings (needed)

Refs NOAA-EMC#2500
@GeorgeGayno-NOAA
Copy link
Contributor

@GeorgeGayno-NOAA What kind of messaging would you like if the T1534 snogrb file is missing? Here is the snogrb section of scripts/exgdas_enkf_sfc.sh with the new INFO message and a suggested FATAL ERROR message with exit if missing:

export FNSNOA=${FNSNOA:-${COMIN_OBS}/${OPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
# Check if resolution specific FNSNOA exists, if not use t1534 version
[[ ! -f $FNSNOA ]] && export FNSNOA="${COMIN_OBS}/${OPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Current snow file is ${FNSNOA}"
if [[ ! -f $FNSNOA ]]; then
  echo "FATAL ERROR: Current snow file ${FNSNOA} is missing. Exiting."
  exit 1
fi
export FNSNOG=${FNSNOG:-${COMIN_OBS_PREV}/${GPREFIX}snogrb_t${JCAP_CASE}.${LONB_CASE}.${LATB_CASE}}
# Check if resolution specific FNSNOG exists, if not use t1534 version
[[ ! -f $FNSNOG ]] && export FNSNOG="${COMIN_OBS_PREV}/${GPREFIX}snogrb_t1534.3072.1536"
echo "INFO: Previous snow file is ${FNSNOG}"
if [[ ! -f $FNSNOG ]]; then
  echo "FATAL ERROR: Previous snow file ${FNSNOG} is missing. Exiting."
  exit 1
fi

Thoughts?

I would add the "echo INFO" line as an 'else' block. i.e.,

if [[ ! -f $FNSNOG ]]; then
echo "FATAL ERROR: Previous snow file ${FNSNOG} is missing. Exiting."
exit 1
else
echo "INFO: Previous snow file is ${FNSNOG}"
fi

@KateFriedman-NOAA
Copy link
Member Author

@GeorgeGayno-NOAA I like that suggestion, I will make that change.

I am going to run a test that invokes the impacted jobs and then open a PR to make the changes in develop.

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Feb 11, 2025
Combine the INFO and FATAL ERROR messaging for snogrb files
into the same if-block.

Refs NOAA-EMC#2500
@KateFriedman-NOAA
Copy link
Member Author

Ran the C96_atm3DVar CI test on Hercules. Updates performed as expected:

+ exglobal_atmos_sfcanl.sh[64]: export FNSNOA=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t190.384.192
+ exglobal_atmos_sfcanl.sh[64]: FNSNOA=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t190.384.192
+ exglobal_atmos_sfcanl.sh[66]: [[ ! -f /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t190.384.192 ]]
+ exglobal_atmos_sfcanl.sh[66]: export FNSNOA=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t1534.3072.1536
+ exglobal_atmos_sfcanl.sh[66]: FNSNOA=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t1534.3072.1536
+ exglobal_atmos_sfcanl.sh[67]: [[ ! -f /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t1534.3072.1536 ]]
+ exglobal_atmos_sfcanl.sh[71]: echo 'INFO: Current snow file is /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t1534.3072.1536'
INFO: Current snow file is /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211221/00/obs/gdas.t00z.snogrb_t1534.3072.1536
+ exglobal_atmos_sfcanl.sh[73]: export FNSNOG=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t190.384.192
+ exglobal_atmos_sfcanl.sh[73]: FNSNOG=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t190.384.192
+ exglobal_atmos_sfcanl.sh[75]: [[ ! -f /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t190.384.192 ]]
+ exglobal_atmos_sfcanl.sh[75]: export FNSNOG=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t1534.3072.1536
+ exglobal_atmos_sfcanl.sh[75]: FNSNOG=/work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t1534.3072.1536
+ exglobal_atmos_sfcanl.sh[76]: [[ ! -f /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t1534.3072.1536 ]]
+ exglobal_atmos_sfcanl.sh[80]: echo 'INFO: Previous snow file is /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t1534.3072.1536'
INFO: Previous snow file is /work2/noaa/stmp/kfriedma/RUNTESTS/COMROOT/C96_atm3DVar/gdas.20211220/18/obs/gdas.t18z.snogrb_t1534.3072.1536

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Feb 11, 2025
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nco-bug Something isn't working in Ops.
Projects
None yet
3 participants