Skip to content

Dual test for NumPy and CuPy in tests #165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tharittk
Copy link
Collaborator

@tharittk tharittk commented Aug 3, 2025

Some bugs were found on

  • Infinity norm causes segmentation fault when using with CuPy + MPI
  • ? Pylops BlockDiag ** 3 seems to reverse the engine to NumPy despite being initialized as CuPy

@mrava87
Copy link
Contributor

mrava87 commented Aug 3, 2025

Some bugs were found on

  • Infinity norm causes segmentation fault when using with CuPy + MPI

This looks like our usual problem with CuPy+MPI... the infinity norms use recv_buf (

recv_buf, op=MPI.MAX)
) but the others don't.. however I did a quick test and took recv_buf from the _allreduce_subcomm calls, but this leads to deadlock also for Numpy+MPI (which is probably the reason the code was written like this in first place)... the issue with using recv_buf is that the _allreduce_subcomm method uses self.sub_comm.Allreduce that we know does not play well with CuPy arrays for now as we are not doing any syncronization.

  • ? Pylops BlockDiag ** 3 seems to reverse the engine to NumPy despite being initialized as CuPy

Found issue in PyLops - fixed here PyLops/pylops#689. For now (until the next PyLops release) we can easily fix the test by using Pop = BDiag * BDiag * BDiag which is technically equivalent to Pop = BDiag ** 3

@mrava87
Copy link
Contributor

mrava87 commented Aug 3, 2025

Also I think for all tests we need to add some logic like

cp.cuda.Device(rank % device_count).use();
to have different ranks use different GPUs otherwise they will all run on the same GPU (default=0)

@tharittk tharittk marked this pull request as ready for review August 5, 2025 13:01
Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharittk good job!

I think this is nearly ready to go. I would maybe add some targets in the Makefile like we have in pylops https://github.com/PyLops/pylops/blob/a94ea8eae3b9c06bf39637b2e29f6a45a0e7766f/Makefile#L54 and in the contributing part of the documentation (see again what we have in pylops and maybe also add something about the NCCL tests and examples which I just realized is missing)

@mrava87
Copy link
Contributor

mrava87 commented Aug 6, 2025

@hongyx11 I think this is pretty much ready and a great addition to our test suite as we move forwards trying to change MPI methods from objects to buffers… do you think we can put this into a self-hosted runner like we did for Pylops… I think a single node with even just 2 GPUs would be enough as if will guarantee that we can do some checks on any change we make in the communication bits of our library 😀

@hongyx11
Copy link
Collaborator

hongyx11 commented Aug 6, 2025

it's doable, let me give it a try, we need to use srun

@mrava87 mrava87 changed the title Dual test for NumPy and CuPy in tests/ Dual test for NumPy and CuPy in tests Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants