Skip to content

Conversation

hppritcha
Copy link
Member

@hppritcha hppritcha commented Jun 9, 2025

This PR adds Univ. of Louisville capstone class mods to Open MPI as well as pointing 3rd-party/prrte to
sha 572d99a6da5c679a5a9c25f45e626c0e4d3e9ff3 (head of ompi_main branch of OMPI fork of prrte),
and moves the 3rd-party/openpmix to be the v6.0.0 release.

Based on follow-on discussions the UofL configury refactor to not allow external PRRTE was removed.
External PRRTE is allowed and the old launch method is used in that case.

@hppritcha hppritcha marked this pull request as draft June 9, 2025 21:33
@hppritcha hppritcha requested a review from jsquyres June 9, 2025 21:33
@rhc54
Copy link
Contributor

rhc54 commented Jun 9, 2025

Refactored --with-prrte configure option to no longer support external PRRTE builds

Ouch - at that point, I can no longer provide support for OMPI 🤷‍♂️ You will be completely on your own.

@hppritcha
Copy link
Member Author

we decided at today's devel call that we would support --with-prrte=internal . that should fix the hdf5 failure.

@hppritcha
Copy link
Member Author

need some distcheck related fixes from open-mpi/prrte#55 to fix the jenkins CI failure

@hppritcha
Copy link
Member Author

@jsquyres take a look when you have a chance.

@hppritcha
Copy link
Member Author

related to open-mpi/prrte#61 . We need updates to ompi_main branch on prrte fork to get help messages working for prrte help.

@hppritcha hppritcha marked this pull request as ready for review July 14, 2025 19:17
@hppritcha
Copy link
Member Author

@jsquyres please review when you have time. we need this in main before branching 6.0.0

@@ -195,6 +203,9 @@ AC_DEFUN([_OMPI_SETUP_PRRTE_INTERNAL], [
opal_prrte_CPPFLAGS_save="${CPPFLAGS}"
OPAL_FLAGS_APPEND_UNIQ([CPPFLAGS], [${opal_pmix_CPPFLAGS}])

AC_DEFINE_UNQUOTED([OMPI_HAVE_PRTE_LAUNCH], [1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a second, seemingly redundant, AC_DEFINE_UNQUOTED for OMPI_HAVE_PRTE_LAUNCH.

Also, is the comment correct -- is prte_launch only available in internal prte? It looks like you're testing for prte_launch in the external case, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworking this. i figured we maywant to support the case that the external prrte has prte_launch

This may indicate an issue with the environment or incorrect configuration.

Error Message: %s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove blank line (because it will be emitted in the error message). If you want an effectively-blank line before the next [section], use #.


return filename;
#endif
#include "3rd-party/prrte/include/prte.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we only supporting mpirun with the internal prte? If so, why does the configury test for prte_launch with an external prte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was an oversight. i'll see how to find prte.h as part of configure...


/* we can use prte_launch */

int main(int argc, char *argv[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code duplication between this main and the non-prte_launch main. Is there a way to consolidate / have less code duplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, there is a "prun_common" that does pretty much what you're talking about here. I'm not sure what this "prte_launch" thing is, but just looking at the code here, it sounds just like prun_common. Might have to do a little jiggling to match signature, but not much.

Is there some reason not to use it? It was created so that "prun" and "prterun" could use the same code and eliminate duplication. Is this a case of a student not realizing it existed?


/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to remove these copyrghts.

I think they're at the bottom of the file because MPI-enabled debuggers would default to opening the source code for mpirun.c upon attach, so we wanted to show the specific banner at the top of this file (vs. all the copyrights). But we still need the copyrights.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see. didn't realize that.

@hppritcha
Copy link
Member Author

related to #13337

@hppritcha hppritcha force-pushed the uofl_ompi_prs branch 2 times, most recently from dba9ea9 to a4680dc Compare July 30, 2025 00:47
This PR adds Univ. of Louisville capstone class mods to
Open MPI as well as pointing 3rd-party/prrte to 0f79f3a
and moves the 3rd-party/openpmix to be the v6.0.0 release.

Signed-off-by: Gerardo Garcia <[email protected]>
Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha
Copy link
Member Author

@jsquyres please recheck when you have a chance.

setenv("OMPI_LIBDIR_LOC", opal_install_dirs.libdir, 1);

// Set environment variable to tell PRTE what MCA prefixes belong
// to Open MPI.
setup_mca_prefixes();

#if OMPI_HAVE_PRTE_LAUNCH

ret = prte_launch(argc, argv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said elsewhere, why not just use "prun_common"? Already exists, public symbol, does pretty much everything you have in your "prte_launch" code. The little bit that is left can easily be put here, just like we did with our "prte", and then call "prun_common". Avoid a bunch of code duplication that adds nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 Excellent suggestion! I spent a bit of time looking into this. I see in prun.c that it sets up schizo before invoking prun_common(). How should we do that from OMPI? I couldn't figure out a way to get into the PRTE (and PMIX) internals safely to do that from the outside.

This is similar to the types of reasons we wrote prte_launch() to just take argc/argv -- i.e., emulate exactly as if we have fork/exec'd prun and could only pass information via the argv command line.

I'm open to suggestion here...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you definitely don't want to emulate prun - you need to emulate prterun. prun is missing a bunch of stuff that you need for mpirun.

I haven't looked at prte_launch but can take a gander if you can point me at it. My guess is that we simply need to add a brief init function to PRRTE to fill in the gap. My concern is that you'll be adding maintenance burden for every time a new cmd line option gets added, or some other change internal to PRRTE requires modifying prun_common. Avoiding code duplication would probably be a good thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prte_launch is on the ompi_main branch in our PRTE fork: https://github.com/open-mpi/prrte/blob/ompi_main/src/runtime/prte_launcher.c#L250

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will take a look this week and see what can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants