Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Efficient setup of random123 streams on GPU #587

Open
@olupton

Description

@olupton

[copy/paste of internal issue created by @pramodk]

In GPU implementation of synapse model, we need to setup random123 streams on GPU which are being done in bbcore_read for every synapse instance:

static void bbcore_read(double* x, int* d, int* xx, int* offset, _threadargsproto_) {
    assert(!_p_rng);
    uint32_t* di = ((uint32_t*)d) + *offset;
        if (di[0] != 0 || di[1] != 0)
        {
      nrnran123_State** pv = (nrnran123_State**)(&_p_rng);
      *pv = nrnran123_newstream(di[0], di[1]);
        }
    *offset += 2;
}

For this we launch a kernel for Serial compute i.e. just initialize stream as:

/* nrn123 streams are created from cpu launcher routine */
nrnran123_State* nrnran123_newstream(uint32_t id1, uint32_t id2) {

    nrnran123_State* s;

    cudaMalloc( (void**)&s, sizeof(nrnran123_State) );
    cudaMemset( (void**)&s, 0, sizeof(nrnran123_State) );

    nrnran123_setup_cuda_newstream<<<1,1>>> (s, id1, id2);
    cudaDeviceSynchronize();

    return s;
}

This is terribly slow and inefficient! See sample profile for small 5msec simulation: nvvp_cuda_stream_random123

Metadata

Metadata

Assignees

No one assigned

    Labels

    gpuimprovementImprovement over existing implementation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions