Open
Description
I have noticed with the latest trunk (after BTL movement) that some of the ibm dynamic tests are failing. However, if I run with coll ^ml
the tests pass. The list of failing tests is:
- ibm/dynamic/intercomm_create
- ibm/dynamic/spawn_multiple
- ibm/dynamic/spawn_with_env_vars
- ibm/dynamic/loop_spawn
I got a core dump from one of the tests and that is shown here.
(gdb) where
#0 0x00007f44f2ce81d0 in ?? ()
#1 <signal handler called>
#2 0x00007f44fdffbd58 in orte_util_compare_name_fields (fields=2 '\002', name1=0x1629b0c, name2=0xf) at ../../orte/util/name_fns.c:522
#3 0x00007f44f1a577c3 in bcol_basesmuma_smcm_allgather_connection (sm_bcol_module=0x7f44ee91b040, module=0x15e11a0,
peer_list=0x7f44f1c5c748, back_files=0x7f44eedb06c8, comm=0x604f40, input=..., base_fname=0x7f44f1a58606 "sm_payload_mem_",
map_all=false) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
#4 0x00007f44f1a4e307 in bcol_basesmuma_bank_init_opti (payload_block=0x163b300, data_offset=64, bcol_module=0x7f44ee91b040,
reg_data=0x162a660) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
#5 0x00007f44f28a3386 in mca_coll_ml_register_bcols (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
#6 0x00007f44f28a368f in ml_module_memory_initialization (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
#7 0x00007f44f28a66b1 in ml_discover_hierarchy (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
#8 0x00007f44f28aae0b in mca_coll_ml_comm_query (comm=0x604f40, priority=0x7fffd2808cb8)
at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
#9 0x00007f44fe915af5 in query_2_0_0 (component=0x7f44f2b06940, comm=0x604f40, priority=0x7fffd2808cb8, module=0x7fffd2808cf0)
at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
#10 0x00007f44fe915ab4 in query (component=0x7f44f2b06940, comm=0x604f40, priority=0x7fffd2808cb8, module=0x7fffd2808cf0)
at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
#11 0x00007f44fe9159be in check_one_component (comm=0x604f40, component=0x7f44f2b06940, module=0x7fffd2808cf0)
at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
#12 0x00007f44fe915804 in check_components (components=0x7f44feb96ed0, comm=0x604f40)
at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
#13 0x00007f44fe90e3b5 in mca_coll_base_comm_select (comm=0x604f40) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
#14 0x00007f44fe8a22ed in ompi_mpi_init (argc=1, argv=0x7fffd2809598, requested=0, provided=0x7fffd2809448)
at ../../ompi/runtime/ompi_mpi_init.c:917
#15 0x00007f44fe8d6e7e in PMPI_Init (argc=0x7fffd280948c, argv=0x7fffd2809480) at pinit.c:84
#16 0x000000000040158f in main (argc=1, argv=0x7fffd2809598) at spawn_with_env_vars.c:151
(gdb)
(gdb) print name1
$1 = (const orte_process_name_t *) 0x1629b0c
(gdb) print *name1
$2 = {jobid = 3282567170, vpid = 1}
(gdb) print *name2
Cannot access memory at address 0xf
(gdb)