-
Notifications
You must be signed in to change notification settings - Fork 75
Multiple commits #2309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Multiple commits #2309
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If someone specifies that child jobs inherit from their parents, then have them inherit any env directives as well as job-level directives. Have children inherit their parent's inheritance directive, unless directed not to do so. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit eb577d4)
If we are inheriting envar directives from our parent job, then extend that to inheriting envar directives for the application of the proc that spawned us. Shift processing of inheritance directives to the mapper, and ensure that the child inherits the inheritance directive so that the grandchildren will also inherit. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit a63791f)
Check RAS components for compile errors by shimming the environment-specific functions Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 17399cd)
Therer were two compensating errors that wound up yielding the correct map, but had a flaw in it should a certain condition exist. So rework the code to fix the errors and remove the flaw. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit bdbf4db)
Work from left-to-right across the cmd line, applying env-related options as we go. When one operation affects the result of another, this preserves a user's common expectation. Add a "--set-env" option if the corresponding PMIx CLI is defined. Seemed a little weird that we had "prepend-env", "append-env", etc., but no "set-env". It's the equivalent of "-x foo=val". Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 805e130)
Signed-off-by: Matthew Whitlock <[email protected]> (cherry picked from commit 0b1ada9)
This error is also displayed in cases where files or directories do not exist and is not only caused by missing permissions. Signed-off-by: Christoph Niethammer <[email protected]> (cherry picked from commit ac77387)
Allow the target node list to follow the ordering inside a provided hostfile and dash-host specification by not assigning a bookmark based on the DVM job. Add support for missing default-hostfile cmd line option We have the support for the user to specify it via MCA param, but somehow we lost the integration to pick it up off of the prte and prterun cmd lines. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 16d8412)
PPR placement policy requests are uniform - i.e., the specified number of procs must be placed on every object of the directed type. When the request includes a cpu/proc directive, then there must also be enough CPUs to meet the request on every object. When that isn't the case, then we need to error out and not just place the proc without binding it. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 665c38e)
If we are using the seq or rankfile mapper and have multiple apps on the cmd line, then allow the mappers to compute their own num procs if one or more are not given. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit cb17cce)
The empty nodes were not properly being added to the list of names to be used by the mapper. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 58130c6)
Per note in the OMPI project, at least one compiler family is removing the "sprintf" function. Replace all uses of that function with the safer "snprintf" version. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 2ff7d6b)
When a timeout is specified and the primary job is timed-out, then we need to ensure we also report and kill any child jobs it started. This includes reporting any requested stack traces. Also all inheritance of output directives like tag and timestamp. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit d072f27)
Port the "launching-apps" section from the OMPI docs over to PRRTE since it specifically deals with prterun usage. Add some updates about gridengine support courtesy of open-mpi/ompi#13450. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 424480d)
Use the hwloc synthetic topology string as the signature instead of our custom attempt at counting number of types of objects - the synthetic retains some hierarchical info and hopefully does a little better job of detecting hetero nodes are in use. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 7e5d030)
Update the MCA param help message to clarify what the param does and what values it supports. Cleanup an error where we would overwrite the resulting list of signals to forward. Cleanup the return value so we don't generate spurious error log output. Provide verbose output showing the signals being forwarded. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 2845dcd)
Further improve automatic handling of hetero nodes by making the non-symmetric signature unique, thereby forcing collection of the full topology from each such node. Fix an error in the topology retrieval procedure whereby we double-counted cached nodes, thereby causing us to quit collecting topologies early. Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 4671290)
Need to init the ess framework to have the signal forwarding list initialized Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit bff13fb)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Inherit env directives if requested
If someone specifies that child jobs inherit from their
parents, then have them inherit any env directives as
well as job-level directives.
Have children inherit their parent's inheritance directive,
unless directed not to do so.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit eb577d4)
Extend inheritance to app level
If we are inheriting envar directives from our parent job, then
extend that to inheriting envar directives for the application
of the proc that spawned us. Shift processing of inheritance
directives to the mapper, and ensure that the child inherits
the inheritance directive so that the grandchildren will also
inherit.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit a63791f)
Extend testbuild launchers support
Check RAS components for compile errors by shimming
the environment-specific functions
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 17399cd)
Fix the colocation algorithm
Therer were two compensating errors that wound up yielding the
correct map, but had a flaw in it should a certain condition
exist. So rework the code to fix the errors and remove the
flaw.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit bdbf4db)
Fix precedence ordering on envar operations
Work from left-to-right across the cmd line, applying env-related
options as we go. When one operation affects the result of another,
this preserves a user's common expectation.
Add a "--set-env" option if the corresponding PMIx CLI is defined.
Seemed a little weird that we had "prepend-env", "append-env", etc.,
but no "set-env". It's the equivalent of "-x foo=val".
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 805e130)
Bugfix: inconsistently setting PMIX_JOB_RECOVERABLE
Signed-off-by: Matthew Whitlock [email protected]
(cherry picked from commit 0b1ada9)
Clarify help messages
This error is also displayed in cases where files or directories do not
exist and is not only caused by missing permissions.
Signed-off-by: Christoph Niethammer [email protected]
(cherry picked from commit ac77387)
Do not assign DVM's bookmark to the application job
Allow the target node list to follow the ordering inside a provided hostfile
and dash-host specification by not assigning a bookmark based on the DVM job.
Add support for missing default-hostfile cmd line option We have the support
for the user to specify it via MCA param, but somehow we lost the integration
to pick it up off of the prte and prterun cmd lines.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 16d8412)
Error out when asymmetric topologies cannot support ppr requests
PPR placement policy requests are uniform - i.e., the specified
number of procs must be placed on every object of the directed
type. When the request includes a cpu/proc directive, then there
must also be enough CPUs to meet the request on every object.
When that isn't the case, then we need to error out and not
just place the proc without binding it.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 665c38e)
Let seq and rankfile mappers compute their own num-procs
If we are using the seq or rankfile mapper and have multiple
apps on the cmd line, then allow the mappers to compute
their own num procs if one or more are not given.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit cb17cce)
Fix relative node processing
The empty nodes were not properly being added to the list
of names to be used by the mapper.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 58130c6)
Replace sprintf with snprintf
Per note in the OMPI project, at least one compiler family is removing the "sprintf" function. Replace all uses of that function with the safer "snprintf" version.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 2ff7d6b)
Extend timeout to child jobs
When a timeout is specified and the primary job is timed-out,
then we need to ensure we also report and kill any child jobs
it started. This includes reporting any requested stack
traces.
Also all inheritance of output directives like tag and timestamp.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit d072f27)
Add launching-apps section to docs
Port the "launching-apps" section from the OMPI docs over
to PRRTE since it specifically deals with prterun usage.
Add some updates about gridengine support courtesy of
open-mpi/ompi#13450.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 424480d)
Improve hetero node detection a bit
Use the hwloc synthetic topology string as the signature
instead of our custom attempt at counting number of types
of objects - the synthetic retains some hierarchical info
and hopefully does a little better job of detecting hetero
nodes are in use.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 7e5d030)
Tweak the forwarding of signals
Update the MCA param help message to clarify what the param
does and what values it supports. Cleanup an error where we
would overwrite the resulting list of signals to forward.
Cleanup the return value so we don't generate spurious
error log output. Provide verbose output showing the
signals being forwarded.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 2845dcd)
Cleanup and improve autohandling of hetero nodes
Further improve automatic handling of hetero nodes
by making the non-symmetric signature unique, thereby
forcing collection of the full topology from each
such node. Fix an error in the topology retrieval
procedure whereby we double-counted cached nodes,
thereby causing us to quit collecting topologies early.
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit 4671290)
Fix prun tool
Need to init the ess framework to have the signal forwarding list initialized
Signed-off-by: Ralph Castain [email protected]
(cherry picked from commit bff13fb)