CUDA API error in ParticleContainer<...>::Redistribute() + workaround
#4795
mirenradia
started this conversation in
General
Replies: 1 comment 1 reply
-
|
@mirenradia Thanks for writing this up! I wonder if you want to make it easier to work around the issue by changing the macro to If you want to do that, please submit a PR. We probably want it for both ExclusiveSum and InclusiveSum. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Disclaimer
This is more of an issue than a discussion but I don't think this is really a problem with AMReX itself and probably something related to the software stack of the system I was running on. Nevertheless, I thought it might be helpful to document the issue and my workaround in case anybody else encounters it or something similar hence why I have opened this discussion.
I spent quite a long time trying to debug this problem and didn't get anywhere so unless someone is confident they know what is going wrong, I don't want to invest much more time in trying to figure out the problem. I'm mainly writing this here just in case anybody else encounters a similar issue.
Summary
I was trying to use the PunctureTracker capabilities of SpacetimeX/CarpetX. When the code called
Redistributeon anamrex::ParticleContainer, I got the following CUDA API error:For more details, see this issue I opened in the Einstein Toolkit issue tracker.
Investigation
Running through
cuda-gdb, I got the following backtrace (only AMReX part included). See above issue for more complete backtrace.Open backtrace
The problem seems to be coming from the call to
cub::DeviceScan::ExclusiveSum()here.I tried several things but none of them worked. These include:
CUDA_LAUNCH_BLOCKING=1.Workaround
In the end, the only thing that worked was removing/commenting out the codepath that used the
cubfunctions and instead relying on this codepath at the end of the function.System/software information
Beta Was this translation helpful? Give feedback.
All reactions