Skip to content

Some dynamic tests fail when coll ml is enabled #210

Open
@ompiteam

Description

@ompiteam

I have noticed with the latest trunk (after BTL movement) that some of the ibm dynamic tests are failing. However, if I run with coll ^ml the tests pass. The list of failing tests is:

  • ibm/dynamic/intercomm_create
  • ibm/dynamic/spawn_multiple
  • ibm/dynamic/spawn_with_env_vars
  • ibm/dynamic/loop_spawn

I got a core dump from one of the tests and that is shown here.

(gdb) where
#0  0x00007f44f2ce81d0 in ?? ()
#1  <signal handler called>
#2  0x00007f44fdffbd58 in orte_util_compare_name_fields (fields=2 '\002', name1=0x1629b0c, name2=0xf) at ../../orte/util/name_fns.c:522
#3  0x00007f44f1a577c3 in bcol_basesmuma_smcm_allgather_connection (sm_bcol_module=0x7f44ee91b040, module=0x15e11a0, 
    peer_list=0x7f44f1c5c748, back_files=0x7f44eedb06c8, comm=0x604f40, input=..., base_fname=0x7f44f1a58606 "sm_payload_mem_", 
    map_all=false) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
#4  0x00007f44f1a4e307 in bcol_basesmuma_bank_init_opti (payload_block=0x163b300, data_offset=64, bcol_module=0x7f44ee91b040, 
    reg_data=0x162a660) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
#5  0x00007f44f28a3386 in mca_coll_ml_register_bcols (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
#6  0x00007f44f28a368f in ml_module_memory_initialization (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
#7  0x00007f44f28a66b1 in ml_discover_hierarchy (ml_module=0x161fdc0) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
#8  0x00007f44f28aae0b in mca_coll_ml_comm_query (comm=0x604f40, priority=0x7fffd2808cb8)
    at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
#9  0x00007f44fe915af5 in query_2_0_0 (component=0x7f44f2b06940, comm=0x604f40, priority=0x7fffd2808cb8, module=0x7fffd2808cf0)
    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
#10 0x00007f44fe915ab4 in query (component=0x7f44f2b06940, comm=0x604f40, priority=0x7fffd2808cb8, module=0x7fffd2808cf0)
    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
#11 0x00007f44fe9159be in check_one_component (comm=0x604f40, component=0x7f44f2b06940, module=0x7fffd2808cf0)
    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
#12 0x00007f44fe915804 in check_components (components=0x7f44feb96ed0, comm=0x604f40)
    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
#13 0x00007f44fe90e3b5 in mca_coll_base_comm_select (comm=0x604f40) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
#14 0x00007f44fe8a22ed in ompi_mpi_init (argc=1, argv=0x7fffd2809598, requested=0, provided=0x7fffd2809448)
    at ../../ompi/runtime/ompi_mpi_init.c:917
#15 0x00007f44fe8d6e7e in PMPI_Init (argc=0x7fffd280948c, argv=0x7fffd2809480) at pinit.c:84
#16 0x000000000040158f in main (argc=1, argv=0x7fffd2809598) at spawn_with_env_vars.c:151
(gdb) 

(gdb) print name1
$1 = (const orte_process_name_t *) 0x1629b0c
(gdb) print *name1
$2 = {jobid = 3282567170, vpid = 1}
(gdb) print *name2
Cannot access memory at address 0xf
(gdb) 


Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions