Skip to content

Forward full environment in spawns #13274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Petter-Programs opened this issue May 23, 2025 · 10 comments
Open

Forward full environment in spawns #13274

Petter-Programs opened this issue May 23, 2025 · 10 comments

Comments

@Petter-Programs
Copy link

Please submit all the information below so that we can understand the working environment that is the context for your question.

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Version 5.0.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From tarball

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Red Hat Enterprise Server; Linux 5.14.x (EL9-based distribution)
  • Computer hardware: Intel Sapphire Rapids 8480
  • Network type: NDR200

Details of the problem

Hi,

I recently discovered the forward environment setting and wanted to play around with it. It works great when I just launch across multiple nodes, but I can't get it to work with MPI_Comm_spawn. Is there any way to do this, or are there any plans to implement support?

Below is a little tester program. In short, the OMPI_ prefix seems to work for spawns too, but not setting the PRTE_MCA_prte_fwd_environment.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <stdbool.h>

int main(int argc, char * argv[]) {
    MPI_Init(&argc, &argv);

    MPI_Comm parent = MPI_COMM_NULL;
    MPI_Comm_get_parent(&parent);

    bool am_parent = parent == MPI_COMM_NULL;

    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if(am_parent)
    {
        MPI_Info the_info;
        MPI_Info_create(&the_info);

        MPI_Info_set(the_info, "PMIX_MAPBY", "PPR:1:NODE");
        MPI_Info_set(the_info, "PMIX_ENVAR", "PRTE_MCA_prte_fwd_environment=1");

        MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 2, the_info, 0, MPI_COMM_WORLD, &parent, MPI_ERRCODES_IGNORE);

        MPI_Info_free(&the_info);
    }

    const char *vars[] = {"OMPI_HELLOTHERE", "DUMMY_VAR"};
    const int nvars = sizeof(vars) / sizeof(vars[0]);

    char processor[MPI_MAX_PROCESSOR_NAME];
    int len_ignore;

    MPI_Get_processor_name(processor, &len_ignore);

    for (int i = 0; i < nvars; i++) {
        char *value = getenv(vars[i]);
        if (value) {
            printf("Hi from rank %d on %s. Child? %s. Environment variable %s found with value: %s\n",
                   rank, processor, am_parent ? "F" : "T", vars[i], value);
        } else {
            printf("Hi from rank %d on %s. Child? %s. Environment variable %s NOT found.\n",
                   rank, processor, am_parent ? "F" : "T", vars[i]);
        }
    }

    MPI_Finalize();

    return 0;
}

I ran this (outside of Slurm) with:

export PRTE_MCA_prte_fwd_environment=1
export DUMMY_VAR=test
export OMPI_HELLOTHERE=test1

mpirun --host host1:10,host2:10 --map-by ppr:1:node -np 2 ./forward

And here's my output:

Hi from rank 0 on host1. Child? F. Environment variable OMPI_HELLOTHERE found with value: test1
Hi from rank 0 on host1. Child? F. Environment variable DUMMY_VAR found with value: test
Hi from rank 1 on host2. Child? F. Environment variable OMPI_HELLOTHERE found with value: test1
Hi from rank 1 on host2. Child? F. Environment variable DUMMY_VAR found with value: test
Hi from rank 0 on host1. Child? T. Environment variable OMPI_HELLOTHERE found with value: test1
Hi from rank 0 on host1. Child? T. Environment variable DUMMY_VAR found with value: test
Hi from rank 1 on host2. Child? T. Environment variable OMPI_HELLOTHERE found with value: test1
Hi from rank 1 on host2. Child? T. Environment variable DUMMY_VAR NOT found.

By the way, I also tried to pass fwd-environment as a runtime option to mpirun, but it did not seem to recognize it:

mpirun --runtime-options fwd-environment --host host1:10,host2:10 --map-by ppr:1:node -np 2 ./forward
--------------------------------------------------------------------------
The specified runtime options directive is not recognized:

  Directive: fwd-environment

Please check for a typo or ensure that the directive is a supported
one.
--------------------------------------------------------------------------
@rhc54
Copy link
Contributor

rhc54 commented May 23, 2025

You raise several issues, so let me try to address them:

OMPI_ prefix seems to work for spawns too

Yes, we always forward any envar prefixed with "OMPI_" as these are known OMPI-owned values

but not setting the PRTE_MCA_prte_fwd_environment

I will investigate, but I suspect the issue is one of inheritance. The directive is applied to the initial launch, but may not be applied to the subsequent child launch minus a directive to tell us to do so.

tried to pass fwd-environment as a runtime option to mpirun, but it did not seem to recognize it

Hmmm...looks to me like the new form of that cmd line option didn't get implemented - my bad. For now, you can do it with just --fwd-environment on the cmd line.

@rhc54
Copy link
Contributor

rhc54 commented May 25, 2025

I have implemented/resolved the various issues so the behavior is as you expected - and matches the documentation:

openpmix/openpmix#3613
openpmix/prrte#2207

I'll be bringing these over to the latest release branches (PMIx v6 and PRRTE v4) - not sure when these might make their way over to OMPI, but it might not be until the next major release.

@Petter-Programs
Copy link
Author

Perfect, thanks for the help! I have tested these changes and the environment is now forwarded even with spawns.

One small thing is that setting the info like this:
MPI_Info_set(the_info, "PMIX_ENVAR", "PRTE_MCA_prte_fwd_environment=1");

Still does not seem to do anything, even though exporting it first then running the program does make it forward the environment. Not a big concern in my case though, as I do not need that level of control.

@rhc54
Copy link
Contributor

rhc54 commented May 26, 2025

That envar only applies to the runtime, and so it can only be read at startup by "mpirun". Thus, setting it in MPI_Comm_spawn comes far too late to have an effect.

I set the default so that child jobs inherit the setting of their parent. However, you can independently control the behavior for the child job by setting a PMIX_FWD_ENVIRONMENT info key to "true" or "false". Unfortunately, I don't see the required support in OMPI's comm_spawn code at this time, so that may not be possible.

@Petter-Programs
Copy link
Author

Thanks for the clarification, I am closing this issue now as it is resolved.

@Petter-Programs
Copy link
Author

Hi, I have had some more time to test the feature and found a couple more concerns/issues:

  1. The propagation only works for the first spawn if spawning multiple times in a nested way
  2. LD_LIBRARY_PATH is set too late to use it to launch the executable in spawns

I'm providing an updated reproducer below. The first issue can be reproduced like this:

export DUMMY_VAR=test1
~/Installations/PRRTE-5.0.0a1-fwd-env/bin/prterun --runtime-options fwd-environment --host host1:10,host2:10 --map-by ppr:1:node -np 2 ./forward
Hi from rank 0 on host1. Child? F. Environment variable DUMMY_VAR found with value: test1
Hi from rank 1 on host2. Child? F. Environment variable DUMMY_VAR found with value: test1
Hi from rank 0 on host1. Child? T. Environment variable DUMMY_VAR found with value: test1
Hi from rank 1 on host2. Child? T. Environment variable DUMMY_VAR found with value: test1
Hi from rank 0 on host1. Child? T (second). Environment variable DUMMY_VAR found with value: test1
Hi from rank 1 on host2. Child? T (second). Environment variable DUMMY_VAR NOT found.

To test (2), I made a compiled a simple .so file (in a different directory) with no real functionality in it and linked it against the reproducer as well as adding its path to LD_LIBRARY_PATH. Without spawning (#define SPAWN 0), the program executes fine across both hosts.

However, when enabling spawning, all the spawned child processes fail to find the linked library and crash. This happens even in the cases where LD_LIBRARY_PATH is propagated correctly with fwd-environment.

Here's the updated reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <stdbool.h>

#define SPAWN 1

int main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);

    MPI_Comm parent = MPI_COMM_NULL;
    MPI_Comm_get_parent(&parent);

    bool am_parent = parent == MPI_COMM_NULL;
    bool second_child = false;

    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (SPAWN)
    {
        if (am_parent)
        {
            MPI_Info the_info;
            MPI_Info_create(&the_info);
            MPI_Info_set(the_info, "PMIX_MAPBY", "PPR:1:NODE");

            char *args[] = {"GO", NULL}; // Just so we can know when to quit spawning
            MPI_Comm_spawn(argv[0], args, 2, the_info, 0, MPI_COMM_WORLD, &parent, MPI_ERRCODES_IGNORE);

            MPI_Info_free(&the_info);
        }
        else
        {
            if (argc > 1)
            {
                MPI_Info the_info;
                MPI_Info_create(&the_info);
                MPI_Info_set(the_info, "PMIX_MAPBY", "PPR:1:NODE");
                MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 2, the_info, 0, MPI_COMM_WORLD, &parent, MPI_ERRCODES_IGNORE);
                MPI_Info_free(&the_info);
            }
            else
            {
                second_child = true;
            }
        }
    }

    const char *vars[] = {"DUMMY_VAR"};
    const int nvars = sizeof(vars) / sizeof(vars[0]);

    char processor[MPI_MAX_PROCESSOR_NAME];
    int len_ignore;

    MPI_Get_processor_name(processor, &len_ignore);

    for (int i = 0; i < nvars; i++)
    {
        char *value = getenv(vars[i]);
        if (value)
        {
            printf("Hi from rank %d on %s. Child? %s%s. Environment variable %s found with value: %s\n",
                   rank, processor, am_parent ? "F" : "T", second_child ? " (second)" : "", vars[i], value);
        }
        else
        {
            printf("Hi from rank %d on %s. Child? %s%s. Environment variable %s NOT found.\n",
                   rank, processor, am_parent ? "F" : "T", second_child ? " (second)" : "", vars[i]);
        }
    }

    MPI_Finalize();

    return 0;
}

@rhc54
Copy link
Contributor

rhc54 commented May 29, 2025

Solving the first issue requires that we have the first child not just forward its environment, but also inherit the "fwd" directive so the second child can inherit it. Not a big deal.

The second issue doesn't quite make sense. We set the envars - including LD_LIBRARY_PATH - prior to fork/exec'ing the child process. There is no way to set it earlier, and it is in advance of any linker resolution. So I have no idea what's going on there.

@Petter-Programs
Copy link
Author

Interesting, I've investigated the second issue a bit further, and I noticed that things also break if you try to pass e.g. LD_PRELOAD as a PMIX_ENVAR, so even without forwarding the environment.

I have a simple library compiled like this:

library.c

#include "library.h"

int add(int a, int b) {
    return a + b;
}

library.h

#ifndef LIBRARY_H
#define LIBRARY_H

int add(int a, int b);

#endif // LIBRARY_H

Makefile:

libtestlib.so: library.c
	gcc -fPIC -shared -o libtestlib.so library.c

Then I have my example program:

mpi_program.c:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);

    MPI_Comm parent = MPI_COMM_NULL;
    MPI_Comm_get_parent(&parent);

    char processor[MPI_MAX_PROCESSOR_NAME];
    int len_ignore;

    MPI_Get_processor_name(processor, &len_ignore);

    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    bool am_parent = parent == MPI_COMM_NULL;

    printf("Hi from rank %d on %s. Child? %s. LD_PRELOAD: %s\n",
           rank, processor, am_parent ? "F" : "T", getenv("LD_PRELOAD"));

    if (am_parent)
    {
        MPI_Info the_info;
        MPI_Info_create(&the_info);
        
        MPI_Info_set(the_info, "PMIX_MAPBY", "PPR:1:NODE");
        MPI_Info_set(the_info, "PMIX_ENVAR", "LD_PRELOAD=/home/Experiments/Lib-Link-Repro/Library/libtestlib.so");

        MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 2, the_info, 0, MPI_COMM_WORLD, &parent, MPI_ERRCODES_IGNORE);

        MPI_Info_free(&the_info);
    }

    MPI_Finalize();

    return EXIT_SUCCESS;
}

and its Makefile:

with_lib: mpi_program.c
	mpicc mpi_program.c -L/home/Experiments/Lib-Link-Repro/Library -ltestlib -o mpi_program

no_lib: mpi_program.c
	mpicc mpi_program.c -o mpi_program

Now if you make the program without linking it with the library:

make no_lib
~/Installations/PRRTE-5.0.0a1-fwd-env/bin/prterun --host host1:10,host2:10 -x LD_PRELOAD --map-by ppr:1:node -np 2 ./mpi_program
Hi from rank 0 on host1. Child? F. LD_PRELOAD: /home/Experiments/Lib-Link-Repro/Library/libtestlib.so
Hi from rank 1 on host2. Child? F. LD_PRELOAD: /home/Experiments/Lib-Link-Repro/Library/libtestlib.so
Hi from rank 0 on host1. Child? T. LD_PRELOAD: /home/Experiments/Lib-Link-Repro/Library/libtestlib.so
Hi from rank 1 on host2. Child? T. LD_PRELOAD: /home/Experiments/Lib-Link-Repro/Library/libtestlib.so

Note that if you edit the code to try to LD_PRELOAD to a file that does not exist, an error message appears before launch, but the program still runs. However, in this case, it all executes with no error message.

Finally, compiling with the library:

make with_lib
~/Installations/PRRTE-5.0.0a1-fwd-env/bin/prterun --host host1:10,host2:10 -x LD_PRELOAD --map-by ppr:1:node -np 2 ./mpi_program
./mpi_program: error while loading shared libraries: libtestlib.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------

prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:

   Process name: [prterun-host1-1049595@1,1]
   Exit code:    127
--------------------------------------------------------------------------
[host1:1049595] PMIX ERROR: PMIX_ERR_UNREACH in file base/ptl_base_connection_hdlr.c at line 95
Fatal glibc error: tpp.c:84 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)
Aborted

@rhc54
Copy link
Contributor

rhc54 commented May 30, 2025

Not sure what to say. The proc is started using execve(cmd, arg, env) which replaces the environment with the one specified (which includes all the envars we were given/collected), so...? Not much more I can do.

@Petter-Programs
Copy link
Author

I just tested your most recent fix that you pushed, and it looks like it resolves the linker issue too when using fwd-environment (the LD_PRELOAD approach above still fails, but it doesn't really matter to me). Strange behavior, I guess the two issues must have been related, but I am happy just using fwd-environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants