-
Notifications
You must be signed in to change notification settings - Fork 917
uofl: changes to Open MPI and move prrte/pmix shas #13299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ouch - at that point, I can no longer provide support for OMPI 🤷♂️ You will be completely on your own. |
we decided at today's devel call that we would support |
need some distcheck related fixes from open-mpi/prrte#55 to fix the jenkins CI failure |
@jsquyres take a look when you have a chance. |
related to open-mpi/prrte#61 . We need updates to ompi_main branch on prrte fork to get help messages working for prrte help. |
@jsquyres please review when you have time. we need this in main before branching 6.0.0 |
config/ompi_setup_prrte.m4
Outdated
@@ -195,6 +203,9 @@ AC_DEFUN([_OMPI_SETUP_PRRTE_INTERNAL], [ | |||
opal_prrte_CPPFLAGS_save="${CPPFLAGS}" | |||
OPAL_FLAGS_APPEND_UNIQ([CPPFLAGS], [${opal_pmix_CPPFLAGS}]) | |||
|
|||
AC_DEFINE_UNQUOTED([OMPI_HAVE_PRTE_LAUNCH], [1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a second, seemingly redundant, AC_DEFINE_UNQUOTED for OMPI_HAVE_PRTE_LAUNCH
.
Also, is the comment correct -- is prte_launch only available in internal prte? It looks like you're testing for prte_launch
in the external case, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworking this. i figured we maywant to support the case that the external prrte has prte_launch
ompi/tools/mpirun/help-mpirun.txt
Outdated
This may indicate an issue with the environment or incorrect configuration. | ||
|
||
Error Message: %s | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Remove blank line (because it will be emitted in the error message). If you want an effectively-blank line before the next [section]
, use #
.
ompi/tools/mpirun/main.c
Outdated
|
||
return filename; | ||
#endif | ||
#include "3rd-party/prrte/include/prte.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we only supporting mpirun
with the internal prte? If so, why does the configury test for prte_launch
with an external prte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was an oversight. i'll see how to find prte.h as part of configure...
ompi/tools/mpirun/main.c
Outdated
|
||
/* we can use prte_launch */ | ||
|
||
int main(int argc, char *argv[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of code duplication between this main
and the non-prte_launch main
. Is there a way to consolidate / have less code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, there is a "prun_common" that does pretty much what you're talking about here. I'm not sure what this "prte_launch" thing is, but just looking at the code here, it sounds just like prun_common. Might have to do a little jiggling to match signature, but not much.
Is there some reason not to use it? It was created so that "prun" and "prterun" could use the same code and eliminate duplication. Is this a case of a student not realizing it existed?
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to remove these copyrghts.
I think they're at the bottom of the file because MPI-enabled debuggers would default to opening the source code for mpirun.c upon attach, so we wanted to show the specific banner at the top of this file (vs. all the copyrights). But we still need the copyrights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see. didn't realize that.
related to #13337 |
dba9ea9
to
a4680dc
Compare
This PR adds Univ. of Louisville capstone class mods to Open MPI as well as pointing 3rd-party/prrte to 0f79f3a and moves the 3rd-party/openpmix to be the v6.0.0 release. Signed-off-by: Gerardo Garcia <[email protected]> Signed-off-by: Howard Pritchard <[email protected]>
@jsquyres please recheck when you have a chance. |
setenv("OMPI_LIBDIR_LOC", opal_install_dirs.libdir, 1); | ||
|
||
// Set environment variable to tell PRTE what MCA prefixes belong | ||
// to Open MPI. | ||
setup_mca_prefixes(); | ||
|
||
#if OMPI_HAVE_PRTE_LAUNCH | ||
|
||
ret = prte_launch(argc, argv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said elsewhere, why not just use "prun_common"? Already exists, public symbol, does pretty much everything you have in your "prte_launch" code. The little bit that is left can easily be put here, just like we did with our "prte", and then call "prun_common". Avoid a bunch of code duplication that adds nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 Excellent suggestion! I spent a bit of time looking into this. I see in prun.c
that it sets up schizo before invoking prun_common()
. How should we do that from OMPI? I couldn't figure out a way to get into the PRTE (and PMIX) internals safely to do that from the outside.
This is similar to the types of reasons we wrote prte_launch()
to just take argc/argv -- i.e., emulate exactly as if we have fork/exec'd prun
and could only pass information via the argv command line.
I'm open to suggestion here...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you definitely don't want to emulate prun
- you need to emulate prterun
. prun
is missing a bunch of stuff that you need for mpirun
.
I haven't looked at prte_launch
but can take a gander if you can point me at it. My guess is that we simply need to add a brief init function to PRRTE to fill in the gap. My concern is that you'll be adding maintenance burden for every time a new cmd line option gets added, or some other change internal to PRRTE requires modifying prun_common
. Avoiding code duplication would probably be a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prte_launch
is on the ompi_main
branch in our PRTE fork: https://github.com/open-mpi/prrte/blob/ompi_main/src/runtime/prte_launcher.c#L250
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will take a look this week and see what can be done.
This PR adds Univ. of Louisville capstone class mods to Open MPI as well as pointing 3rd-party/prrte to
sha 572d99a6da5c679a5a9c25f45e626c0e4d3e9ff3 (head of ompi_main branch of OMPI fork of prrte),
and moves the 3rd-party/openpmix to be the v6.0.0 release.
Based on follow-on discussions the UofL configury refactor to not allow external PRRTE was removed.
External PRRTE is allowed and the old launch method is used in that case.