Basic implementation of a new communication layer for Charm++
If you want to run Reconverse locally (single node), all you have to do is the following:
$ cd reconverse
$ mkdir build
$ cd build
$ cmake ..
$ make
You can configure the build by passing options to CMake with:
cmake -D<option_name>=<value> ..Some general options include:
-
RECONVERSE_ENABLE_CPU_AFFINITY(ONby default if HWLOC is found) Enables CPU affinity support via hwloc. Requires HWLOC to be installed. -
CMAKE_BUILD_TYPE(not set by default) Selects the build type and corresponding compiler flags. For example:Release: compiles with-O3Debug: compiles with-g
Currently, Reconverse has two communication backends:
- LCI (https://github.com/uiuc-hpc/lci): the preferred backend for Infiniband, RoCE, and Slingshot-11 clusters. It is expected to achieve better performance than MPI.
- LCW (https://github.com/JiakunYan/lcw): the fallback backend for traditional MPI clusters. It is compatible with a wide range of hardware but may not achieve the same performance as LCI. LCW is merely a active message wrapper layer for MPI.
-
RECONVERSE_TRY_ENABLE_COMM_LCI2(ONby default) Attempts to find an external LCI installation and enable the LCI backend. -
RECONVERSE_AUTOFETCH_LCI2(OFFby default) Automatically fetches LCI from GitHub if no external installation is found. -
RECONVERSE_AUTOFETCH_LCI2_TAG(defaults to a predefined commit hash) Specifies the Git commit hash, tag, or branch to use when fetching LCI. -
FETCHCONTENT_SOURCE_DIR_LCI(not set by default) Path to a local LCI source tree. If set, autofetch uses this local copy instead of fetching from GitHub.
If LCI is autofetched, you can further customize the LCI build by passing additional CMake variables. Important ones include
-DLCI_NETWORK_BACKENDS=[ofi|ibv](ibv;ofiby default): explicitly select the LCI backend to be libfabric (ofi) or libibverbs (ibv).ibvshould be used for Infiniband and RoCE clusters.ofishould be used for shared memory system (e.g. laptop) and slingshot-11 clusters.-DLCT_PMI_BACKEND_ENABLE_MPI=ON(Default:OFF): let LCI bootstrap with MPI.
-
RECONVERSE_TRY_ENABLE_COMM_LCW(ONby default) Attempts to find an external LCW installation and enable the LCW backend. -
RECONVERSE_AUTOFETCH_LCW(OFFby default) Automatically fetches LCW from GitHub if no external installation is found. -
RECONVERSE_AUTOFETCH_LCW_TAG(defaults to a predefined commit hash) Specifies the Git commit hash, tag, or branch to use when fetching LCW. -
FETCHCONTENT_SOURCE_DIR_LCW(not set by default) Path to a local LCW source tree. If set, autofetch uses this local copy instead of fetching from GitHub.
The example executables are located in the build/test/<program_name> folders. You can run them with mpirun, srun, or lcrun depending on your system configuration.
+pe <num>: specify the total number of PEs across all processes.+backend <lci|lcw>: select the communication backend at runtime. If not specified, Reconverse will use the first available backend in the order of LCI, LCW.
libfabricas LCI's network backend for shared memory system. You can install them with
$ sudo apt install libfabric-bin libfabric-dev
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_AUTOFETCH_LCI2=ON ..
$ make
Using lcrun to run the reconverse example is typically the most simplest way. First, you need to locate LCI's lcrun executable. It is located in the LCI source directory and will be installed to the bin folder if you installed LCI by yourself. If you used the cmake autofetch support, it will typically be located in the <build_directory>/_deps/lci-src folder.
Run the reconverse example with lcrun:
$ cd build/
$ _deps/lci-src/lcrun -n 2 test/ping_ack/reconverse_ping_ack +pe 4
Note: if you installed libfabric in a non-standard location, the linker may complain it cannot find the libfabric shared library, in which case you need to let the linker find them by
export LD_LIBRARY_PATH=<path_to_libfabric_lib>:${LD_LIBRARY_PATH}
To use CMake Autofetch support:
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_AUTOFETCH_LCI2=ON -DLCI_NETWORK_BACKENDS=ofi ..
$ make
Note: Explicitly specifying -DLCI_NETWORK_BACKENDS=ofi is only needed for Slingshot-11 systems.
Run the reconverse example with srun:
$ cd build/
$ srun --mpi=pmi2 -n 2 test/ping_ack/reconverse_ping_ack +pe 4
If you want to install LCI by yourself, here is an example build procedure on NCSA's Delta machine using the OFI layer:
$ git clone https://github.com/uiuc-hpc/lci.git --branch=lci2
$ cd lci
$ export OFI_ROOT=/opt/cray/libfabric/1.15.2.0
$ export LCI_ROOT=/where/you/want/to/install/lci
$ cmake -DCMAKE_INSTALL_PREFIX=$LCI_ROOT .
$ make install
$ cd ..
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build && cd build
$ cmake ..
$ make
Run the reconverse example with srun:
$ cd build/test/pingpong
$ srun -n 2 ./reconverse_ping_ack +pe 4
Make sure you have an MPI implementation installed (e.g., OpenMPI, MPICH, etc.). Then, to build Reconverse with the LCW backend using CMake Autofetch support:
$ export MPI_ROOT=/path/to/mpi # if not installed in a standard location
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_AUTOFETCH_LCW=ON ..
$ make
Run the reconverse example with srun or mpirun:
$ cd build/
# `+backend lcw` is optional if you only have the LCW backend
$ mpirun -n 2 test/ping_ack/reconverse_ping_ack +backend lcw +pe 4