Bug 1535 - Add the NAEFS jobs restart capability for file alerts #26

BoCui-NOAA · 2025-02-14T16:07:19Z

Currently the WCOSS implementation requirement is to have a restart capability for the job which runs greater than 15 minutes to

to save time when recovering from a failure.
to keep as consistent as possible the time our end users get data.
Also another factor that is keep the dbnet load minimized and as well as outgoing networks.

For the naefs jobs, apparently it's a very large number of files been alerted within a short runtime as below. Therefore we would like to have a threshold standard for file alerts that also triggers the need for restart capability, rather than just the 15 min time.

Please add the restart capability for file alerts the next NAEFS upgrade that -

when rerun the naefs job from a failure, improve the scripts to check and not to alert the existed output data files from previous run.
also extend improvement of the scripts to check and not process/generate the existed output data files from previous run, specially for the gempak scripts/jobs.

NAEFS v7.0 job runtime and file alerts - job runtime (min) alerts
naefs_gefs_prob_avgspr 1.1 1440
naefs_fnmoc_ens_gempak 10.8 1940
naefs_cmc_ens_gempak 6.4 2134
naefs_cmc_ens_post 7.9 4462
naefs_gefs_debias_gempak 8.8 6136
naefs_gefs_debias 164.3 9172 (rerun ~13 mins without wait/sleep)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1535 - Add the NAEFS jobs restart capability for file alerts #26

Bug 1535 - Add the NAEFS jobs restart capability for file alerts #26

BoCui-NOAA commented Feb 14, 2025 •

edited

Loading

Bug 1535 - Add the NAEFS jobs restart capability for file alerts #26

Bug 1535 - Add the NAEFS jobs restart capability for file alerts #26

Comments

BoCui-NOAA commented Feb 14, 2025 • edited Loading

BoCui-NOAA commented Feb 14, 2025 •

edited

Loading