-
|
@daboehme so I've been recently trying to run Caliper on Frontier, and I've been running into some issues there were my previous approach of running Caliper on Summit made use of your response here: #151 (comment) would cause my program to crash due to a missing On Summit, I'd initialize caliper in my app using a macro that just called the below at the start of things #define CALI_INIT \
cali_mpi_init(); \
cali_init();
int main(int argc, char *argv[])
{
CALI_INIT
CALI_CXX_MARK_FUNCTION;
CALI_MARK_BEGIN("main_driver_init");
// Initialize MPI.
int num_procs, myid;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
...
MPI_Finalize();
}I've been working with the OLCF on possible work arounds and they pointed me to the standard configs such as the runtime-report work. In order to try and get something up running I've been looking into using these standard configurations. So, I've modified my code slightly to make use of the #define CALI_MPI_INIT \
cali_mpi_init();
#define CALI_INIT(mpi_rank) \
cali::ConfigManager mgr; \
if(const char* env_p = std::getenv("CALI_CONFIG")) { mgr.add(env_p); if (mpi_rank == 0) { std::cout << env_p << std::endl; }} \
if (mgr.error() && mpi_rank == 0) { std::cerr << "ConfigManager: " << mgr.error_msg() << std::endl; } \
mgr.start();
#define CALI_FINALIZE \
mgr.flush(); \
mgr.stop();
int main(int argc, char *argv[])
{
CALI_MPI_INIT
// Initialize MPI.
int num_procs, myid;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
CALI_MPI_INIT(myid)
CALI_CXX_MARK_FUNCTION;
CALI_MARK_BEGIN("main_driver_init");
...
CALI_FINALIZE
MPI_Finalize();
} The code does appear to be running at least and providing me timing metrics. However, I've noticed at least on my super simple 1 node test with 8 MPI ranks I'm seeing in the caliper outputs two sets of metrics like when using the following input: export CALI_CONFIG="runtime-report(aggregate_across_ranks=true,calc.inclusive=true,profile.mpi,output=stderr)"
srun -N 1 -n 8 -c 7 --threads-per-core=1 --cpu-bind=threads --gpus-per-task=1 ./mechanics -opt ./options_frontier.tomlAs I've only ever used that other input file where this never occurred, I'm not too sure if this expected behavior or if there might be a way to fix it? Any help would be appreciated as I'm trying to get some timings put together for some reports. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Hi @rcarson3, you don't strictly need your own |
Beta Was this translation helpful? Give feedback.
Hi @rcarson3, you don't strictly need your own
ConfigManagerinstance in this case. Caliper has an internalConfigManagerthat runs the configuration provided in theCALI_CONFIGenvironment variable. Here you're creating anotherConfigManagerinstance that runs the same configuration, that's why you're essentially getting the same output twice. So you should use eitherCALI_CONFIGor your ownConfigManager. If you use your ownConfigManagerinstance you'd typically use some application-specific way, e.g. a command-line argument, to pass in theruntime-report...configuration string. Hope this helps!