Parallelizing matrix operations using MPI

The project implements parallel matrix transpose algorithms using MPI and compares them to the sequential and OpenMP implementations

Toolchain used

MPICH 3.2.1 (module mpich-3.2.1--gcc-9.1.0 on the HPC Cluster)
GCC 9.1.0 (module gcc91 on the HPC Cluster)
GNU Make 3.82
PBS job scheduler (for cluster execution)
Python 3.10.12 with matplotlib v3.8.2, numpy v1.26.3 and pandas v2.2.3 (for plotting)

Local Compiling/Executing Instructions

Clone or download the repository and navigate to the project directory

git clone https://github.com/timmfy/parco-d2

Running using scripts

The section below describes how to run the code using the .sh scripts either locally or on the cluster using the interactive session. Before running, make sure to enable the scripts using the following command

chmod +x *.sh

Running all the benchmarks

The following command will run the benchmarks:

Sequential transpose
Sequential matrix symmetry check function
OpenMP transpose
MPI transpose using sendrecv()
MPI transpose with block-based distribution and point-to-point communication
MPI transpose with block-based distribution and collective (all to all) communication
MPI matrix symmetry check function The benchmarks are run for the square NxN matrices where N is 32, 64, 128, 256, 512, 1024, 2048, 4096 and for the number of processes 2, 4, 8, 16, 32, 64 (for the case of MPI and OpenMP benchmarks) The parameter <numRuns> specifies the number of times each benchmark is run (recommended values are <30 in order to avoid long execution times)

./run_all_benchmarks.sh <numRuns>

The results will be stored in the results.csv file. The results for each benchmark are also stored in separate .csv files

Running a single benchmark

In order to run a single benchmark(one of the above), use the following command:

./run_benchmark.sh --algorithm <int> --checkSym <int> -i <string> --processes-list <int,int,...> --size-list <int,int,...> --runs <int>

The parameters are as follows:

--algorithm specifies the MPI transpose implementation to run. May be omitted when running other benchmarks. The possible values are:
- 0 - Simple algorithm using sendrecv() (default)
- 1 - Block-based distribution with point-to-point communication
- 2 - Block-based distribution with collective communication
--checkSym specifies whether to run the matrix symmetry check function. The possible values are:
- 0 - Run the transpose function (default)
- 1 - Run the symmetry check
i specifies the implementation to run. The possible values are:
- SEQ - Sequential
- OMP - OpenMP
- MPI - MPI (default)
--processes-list specifies the list of the number of processes to run the multiprocessor benchmarks for. Each argument n will set the number of processes to 2^n. So --processes-list 2,4,6 will run the benchmark for 4, 16 and 64 processes. (default is 2 to run for 4 processes)
--size-list specifies the list of matrix sizes to run the benchmarks for. Each argument n will set the matrix size to 2^n. So --size-list 5,6,7 will run the benchmark for the matrix sizes 32, 64 and 128. (default is 10 to run for 1024x1024 matrices)
--runs specifies the number of times to run the benchmark (default is 10) Examples: This will run the MPI transpose benchmark using the block-based distribution with point-to-point communication for the matrix sizes 256, 512, 1024, and 2048 using 32 processes. The benchmark will be run 20 times

./run_benchmark.sh --algorithm 1 -i MPI --processes-list 5 --size-list 8,9,10,11 --runs 20

This will run the OMP transpose benchmark with the matrix size 4096 using 16 and 32 processes. The benchmark will be run 30 times

./run_benchmark.sh -i OMP --processes-list 4,5 --size-list 12 --runs 30

This will run the MPI checkSym benchmark for the matrix sizes 32, 64, 128, 256, 512, 1024, 2048, 4096 using 2, 4, 8, 16, 32, 64 processes. No number of runs is indicated so the benchmark will be run 10 times (as by default)

./run_benchmark.sh --checkSym 1 -i MPI --processes-list 1,2,3,4,5,6 --size-list 5,6,7,8,9,10,11,12

Manual Compilation and Running

Compilation

It is recommended to compile the code using the provided Makefile. The following command will compile the code:

make N=<size>

where <size> is the size of the matrix to be used in the benchmarks (should be a power of 2). Without the makefile, the following sequence of commands can be used to compile the code:

mpicc -DN=<matrix_size> -fopenmp -Iinclude -c src/main.c -o main.o
mpicc -DN=<matrix_size> -fopenmp -Iinclude -c src/mpi_par.c -o mpi_par.o
mpicc -DN=<matrix_size> -fopenmp -Iinclude -c src/omp_par.c -o omp_par.o
mpicc -DN=<matrix_size> -fopenmp -Iinclude -c src/seq.c -o seq.o
mpicc -fopenmp main.o mpi_par.o omp_par.o seq.o -o main

Running

If compiled manually:

To run the sequential benchmark, use the following command:

./main SEQ <numRuns> <checkSym>

To run the OpenMP benchmark, use the following command:

./main OMP <numRuns>

To run the MPI transpose benchmark using the sendrecv() function, use the following command:

mpirun -np <numProcesses> ./main MPI <numRuns> <checkSym> <algorithm>

where <numProcesses> is the number of processes to run the benchmark for (should be a power of 2), <numRuns> is the number of times to run the benchmark, <checkSym> specifies whether to run the matrix symmetry check function and <algorithm> specifies the MPI transpose implementation to run (specified above) (default is 0). If compiled the code with the Makefile, use ./bin/main instead of ./main

Cluster Compilation and Running

Using the interactive session:
- Connect to the cluster using SSH
- Request an interactive session using the following command:
```
qsub -I -q short_cpuQ -l select=1:ncpus=64:mpiprocs=64:mem=1gb
```
- Load the required modules using the following commands:
```
module load gcc91
module load mpich-3.2.1--gcc-9.1.0
```
- Clone or download the repository and navigate to the project directory
- Compile the code using the provided Makefile or manually
- Run the benchmarks using the provided scripts or manually
Using the batch job:
- Connect to the cluster using SSH
- Clone or download the repository and navigate to the project directory
- Modify the file parco-d2.pbs to specify the project directory path and if needed, the commands for compilation and running (described above) and the walltime. By default, it runs
```
./run_all_benchmarks.sh 10
```
- Submit the job using the following command:
```
qsub parco-d2.pbs
```
- The results will be stored in the results.csv file (if running muliple benchmarks using a script)
- Standard output and errors will be stored in the parco-d2.out and parco-d2.err files respectively

(Optional) Plotting the results

The results can be plotted using the provided Python script plot.py. It requires the matplotlib, numpy and pandas libraries to be installed. The script reads the results.csv file and plots the execution times and strong/weak scaling for the different benchmarks. The script can be run using the following command:

python3 plot.py

Customization of the plots can be done by modifying the main() function in the script and changing the parameters of the following functions:

plot_strong_scaling(dataframe, implementation, matrix_size, function): Plots the strong scaling for the specified implementation(MPI simple, MPI block all2all or MPI block point2point for the transpose benchmark, while the checkSym benchmark has only MPI implementation), matrix size and function(transpose or checkSym)
plot_execution_time_variable_size(dataframe, processes_list, implementation, min_size, max_size, function): Plots the execution time for the specified implementation, various processes, various matrix sizes (from min_size to max_size) and function(transpose or checkSym)
plot_weak_scaling(dataframe, implementation, base_matrix_size, function): Plots the weak scaling for the specified implementation, base matrix size (used to set the 1-process baseline) and a function
plot_execution_time_fixed_size(dataframe, matrix_size, implementations, processes_list): Used only for the transpose benchmark in order to compare the execution times for different implementations and processes for a fixed matrix size. The plots require the results.csv file to be present in the project directory. Otherwise, the sample data sample_data.csv can be used. The plots will be saved in the figures directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallelizing matrix operations using MPI

Toolchain used

Local Compiling/Executing Instructions

Running using scripts

Running all the benchmarks

Running a single benchmark

Manual Compilation and Running

Compilation

Running

Cluster Compilation and Running

(Optional) Plotting the results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
include		include
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
parco-d2.pbs		parco-d2.pbs
plot.py		plot.py
run_all_benchmarks.sh		run_all_benchmarks.sh
run_benchmark.sh		run_benchmark.sh
sample_data.csv		sample_data.csv

Folders and files

Latest commit

History

Repository files navigation

Parallelizing matrix operations using MPI

Toolchain used

Local Compiling/Executing Instructions

Running using scripts

Running all the benchmarks

Running a single benchmark

Manual Compilation and Running

Compilation

Running

Cluster Compilation and Running

(Optional) Plotting the results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages