Skip to content

Add output directory validation and graceful exit in main.py #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ali-john
Copy link

This PR adds validation checks for the output directory (RUN_OUT) specified in the configuration file. The changes ensure that before running the workflow, the script:

  1. Verifies that RUN_OUT is defined.
  2. Converts the output directory to an absolute path.
  3. Attempts to create the directory if it doesn't already exist.
  4. Checks if the directory is writable.

If any of these checks fail, the script logs an appropriate error message and exits gracefully instead of attempting to write to an unwritable or non-existent directory.

This prevents runtime errors (such as trying to create directories in the root filesystem) and improves the robustness of the workflow. All changes have been tested locally with various configurations.

@ali-john
Copy link
Author

@SorooshMani-NOAA can you review this PR please?

@SorooshMani-NOAA
Copy link
Collaborator

@ali-john I like the idea of having more checks, etc. but I'd like to see these checks being added as a part of the larger effort to rewrite workflow.sh in Python.

workflow.sh that you see now is the 3nd version of the workflow, which was developed in an attempt to organize this workflow such that it is packaged as a Python package. If you look at the commit history you'll see that this script came from reorganizing the version 1 of the workflow which was based on Prefect workflow manager and then later 2 which was to run the workflow in a traditional HPC environment.

Ideally I'd like to see (version 4 of) workflow.sh to be split into a bunch of .py files which do all the checks etc. I'd like to keep the main.py as simple as possible for now and don't split checks between workflow.sh and main.py at this point; I'd like these checks to be added to a workflow.py file (or a file that does part of that, e.g. workflow_sanitycheck.py, you get the idea).

It'd be a good self contained project actually, but we need to think about its priority as well; frankly as much as I like clean code and scripts, since this has been working and we're actively using it I don't want to change it drastically in short term.

If the same checks above can be added to workflow.sh at this point (instead of main.py), it'd be more useful. Moreover first and foremost ideally I'd like the updated input.yaml to be copied to run output directory instead of the one passed as argument: see

cp $input_file $run_dir/input_asis.yaml

We can do this in main.py since we're updating our config there. Thinking about it again, maybe if you bundle your current checks and changes with that input.yaml thing I just said, that makes more sense. Just update the path in main.py, make a copy of the updated file instead of the _asis one, and then do the checks in workflow.sh.

FYI @FariborzDaneshvar-NOAA.

@ali-john
Copy link
Author

@SorooshMani-NOAA , that makes sense. I’ll update the path in main.py so that it writes an updated copy of the input.yaml file, and then add the necessary path checks in workflow.sh. Also, is a full rewrite of workflow.sh not a priority at this time? If it aligns with our goals, I could dedicate some time to rewriting it as well.

@SorooshMani-NOAA
Copy link
Collaborator

is a full rewrite of workflow.sh not a priority at this time?

I would say no, but I'd like to see what @FariborzDaneshvar-NOAA thinks as well. We're actually adding some newer goals to ioos/gsoc#79 after revisiting our priorities. The main priority for GSoC 25 right now is:

  1. to extract the "core functionalities" of EnsemblePerturbation package (pre and post)processing, as a separate packages such that it relies on very few dependencies, e.g. only things such as chaospy, numpy, etc. but not coupledmodeldriver, pyschism, adcircpy, even searvey, etc.
  2. working on searvey and/or StormEvents to add new data sources as currently described in the GSoC issue

@FariborzDaneshvar-NOAA
Copy link
Collaborator

@SorooshMani-NOAA Thanks for taking lead and your prompt response.
@ali-john as @SorooshMani-NOAA noted, rewriting workflow.sh is not a priority at this point, instead integrated packages (i.e., EnsemblePerturbation, Searvey, and StormEvents) should be revised first. We should get to workflow.sh sometime later down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants