Basically porting https://github.com/JuliaGPU/CUDA.jl/blob/b0b484c5bf7e7a8342e0285e42c5e235aa252c32/src/random.jl#L12-L190 over
Ideally we move this code over to GPUArrays.jl, but that would mean requiring back-ends to implement device-side rand(), which is a big ask.