Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HPSS support and fix memory unsetting for Gaea C5/6 #3323

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Feb 13, 2025

Description

This adds HPSS support for the Gaea clusters by utilizing the es cluster's dtn_f5_f6 partition, which has an HPSS connection. A number of small fixes and some refactoring were also introduced in this PR including

  • Fixed memory variable unsetting for Gaea C5/6 in config.resources.GAEAC{5,6}
  • Refactoring the system-level parameter detection when determining task resources in the setup scripts to make it easier to define multiple partitions, queues, and clusters.
  • Adding a DTN partition, queue, and cluster definition.
  • Added/renamed missing/miss-named tasks to tasks.py and added a check that the input task is valid.

NOTE: Archiving from the DTNs for files located on the f6 filesystem is excruciatingly slow and can bog down both C5 and C6. Thus, it is recommended to not use HPSS at this time on Gaea/C6. Therefore, the option is disabled by default. According to system admins, there should be new DTNs installed soon that will help alleviate this issue.

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

  • C48_ATM on C6
  • C48_S2SW on C6
  • Cycle testing on C6
  • CI suite on Hera
  • CI suite on WCOSS2

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review February 19, 2025 19:07
@DavidHuber-NOAA
Copy link
Contributor Author

This PR is now ready for review.

HPSS support for C6 is technically enabled, but should not be used until new F6-connected DTNs are added to the system and is thus disabled by default. These nodes are currently scheduled to come online in mid-March. This feature can then be enabled by default after testing of the new nodes is complete.

C96C48, GSI-based cycling, C48 ATM and S2SW forecast-only tests were completed on the system successfully. Logs from the runs can be found here: /gpfs/f6/drsa-hurr1/world-shared/noscrub/David.Huber/COMROOT.

Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for putting all these changes together and testing them @DavidHuber-NOAA !

aerorahul
aerorahul previously approved these changes Feb 19, 2025
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

local hsi_mod_path=(os.getenv("hsi_mod_path") or "None")
append_path("MODULEPATH", hsi_mod_path)
-- TODO remove this path when the official (hsi_mod_path) is added to the DTN
append_path("MODULEPATH", "/sw/hpss/modulefiles")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidHuber-NOAA Looks like this has officially moved to /usw/hpss/modulefiles on C6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DavidBurrows-NCO. Updated.

@DavidBurrows-NCO
Copy link
Contributor

@DavidHuber-NOAA I looked through the Gaea updates and just made the one comment on C6 hsi path update. Otherwise, looks good. Great to see all the pieces coming together. Thanks!

Copy link
Contributor

@AnilKumar-NOAA AnilKumar-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for putting all these pieces together for C5/C6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants