-
Notifications
You must be signed in to change notification settings - Fork 110
Description
Is your feature request related to a problem? Please describe.
GPU random number generators such as curand and hiprand are a pain to use. They require allocating and maintaining an array of states that is at least the length of the number of threads being launched. But allocating an array that is the max number of threads on a particular GPU could exhaust memory.
Describe the solution you'd like
A class similar to RAJA::Reduce that could be captured, and since RAJA knows the length of the loop, it could allocate/reallocate the array of states to the proper length.
Describe alternatives you've considered
I've written my own wrapper class in application codes, but it's not easily shareable, and doesn't automatically reallocate in RAJA loops. It's also more of a singleton, which isn't ideal.
Additional context
https://docs.nvidia.com/cuda/curand/device-api-overview.html#device-api-overview
https://rocm.docs.amd.com/projects/hipRAND/en/latest/
It would also be worth considering whether something similar is also needed for CPU threaded code.