-
Notifications
You must be signed in to change notification settings - Fork 901
Forward full environment in spawns #13274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You raise several issues, so let me try to address them:
Yes, we always forward any envar prefixed with "OMPI_" as these are known OMPI-owned values
I will investigate, but I suspect the issue is one of inheritance. The directive is applied to the initial launch, but may not be applied to the subsequent child launch minus a directive to tell us to do so.
Hmmm...looks to me like the new form of that cmd line option didn't get implemented - my bad. For now, you can do it with just |
I have implemented/resolved the various issues so the behavior is as you expected - and matches the documentation: openpmix/openpmix#3613 I'll be bringing these over to the latest release branches (PMIx v6 and PRRTE v4) - not sure when these might make their way over to OMPI, but it might not be until the next major release. |
Perfect, thanks for the help! I have tested these changes and the environment is now forwarded even with spawns. One small thing is that setting the info like this: Still does not seem to do anything, even though exporting it first then running the program does make it forward the environment. Not a big concern in my case though, as I do not need that level of control. |
That envar only applies to the runtime, and so it can only be read at startup by "mpirun". Thus, setting it in MPI_Comm_spawn comes far too late to have an effect. I set the default so that child jobs inherit the setting of their parent. However, you can independently control the behavior for the child job by setting a |
Thanks for the clarification, I am closing this issue now as it is resolved. |
Hi, I have had some more time to test the feature and found a couple more concerns/issues:
I'm providing an updated reproducer below. The first issue can be reproduced like this:
To test (2), I made a compiled a simple .so file (in a different directory) with no real functionality in it and linked it against the reproducer as well as adding its path to LD_LIBRARY_PATH. Without spawning (#define SPAWN 0), the program executes fine across both hosts. However, when enabling spawning, all the spawned child processes fail to find the linked library and crash. This happens even in the cases where LD_LIBRARY_PATH is propagated correctly with fwd-environment. Here's the updated reproducer:
|
Solving the first issue requires that we have the first child not just forward its environment, but also inherit the "fwd" directive so the second child can inherit it. Not a big deal. The second issue doesn't quite make sense. We set the envars - including |
Interesting, I've investigated the second issue a bit further, and I noticed that things also break if you try to pass e.g. LD_PRELOAD as a PMIX_ENVAR, so even without forwarding the environment. I have a simple library compiled like this: library.c #include "library.h"
int add(int a, int b) {
return a + b;
} library.h #ifndef LIBRARY_H
#define LIBRARY_H
int add(int a, int b);
#endif // LIBRARY_H Makefile:
Then I have my example program: mpi_program.c: #include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
MPI_Comm parent = MPI_COMM_NULL;
MPI_Comm_get_parent(&parent);
char processor[MPI_MAX_PROCESSOR_NAME];
int len_ignore;
MPI_Get_processor_name(processor, &len_ignore);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
bool am_parent = parent == MPI_COMM_NULL;
printf("Hi from rank %d on %s. Child? %s. LD_PRELOAD: %s\n",
rank, processor, am_parent ? "F" : "T", getenv("LD_PRELOAD"));
if (am_parent)
{
MPI_Info the_info;
MPI_Info_create(&the_info);
MPI_Info_set(the_info, "PMIX_MAPBY", "PPR:1:NODE");
MPI_Info_set(the_info, "PMIX_ENVAR", "LD_PRELOAD=/home/Experiments/Lib-Link-Repro/Library/libtestlib.so");
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 2, the_info, 0, MPI_COMM_WORLD, &parent, MPI_ERRCODES_IGNORE);
MPI_Info_free(&the_info);
}
MPI_Finalize();
return EXIT_SUCCESS;
} and its Makefile:
Now if you make the program without linking it with the library:
Note that if you edit the code to try to LD_PRELOAD to a file that does not exist, an error message appears before launch, but the program still runs. However, in this case, it all executes with no error message. Finally, compiling with the library:
|
Not sure what to say. The proc is started using |
I just tested your most recent fix that you pushed, and it looks like it resolves the linker issue too when using fwd-environment (the LD_PRELOAD approach above still fails, but it doesn't really matter to me). Strange behavior, I guess the two issues must have been related, but I am happy just using fwd-environment. |
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
Version 5.0.7
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From tarball
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
Hi,
I recently discovered the forward environment setting and wanted to play around with it. It works great when I just launch across multiple nodes, but I can't get it to work with MPI_Comm_spawn. Is there any way to do this, or are there any plans to implement support?
Below is a little tester program. In short, the OMPI_ prefix seems to work for spawns too, but not setting the PRTE_MCA_prte_fwd_environment.
I ran this (outside of Slurm) with:
And here's my output:
By the way, I also tried to pass fwd-environment as a runtime option to mpirun, but it did not seem to recognize it:
The text was updated successfully, but these errors were encountered: