NEURON integrated tests failing with CUDA Unified Memory enabled

**Describe the issue**
Some of the NEURON test are failing on GPU when CUDA Unified Memory is enabled in CoreNEURON.
More precisely the tests that fail are:
```
The following tests FAILED:
         18 - coreneuron_modtests::direct_py (Failed)
         19 - coreneuron_modtests::direct_hoc (Failed)
         20 - coreneuron_modtests::spikes_py (Failed)
         21 - coreneuron_modtests::spikes_file_mode_py (Failed)
         22 - coreneuron_modtests::datareturn_py (Failed)
         25 - coreneuron_modtests::spikes_mpi_py (Failed)
         26 - coreneuron_modtests::spikes_mpi_file_mode_py (Failed)
         41 - testcorenrn_patstim::coreneuron_gpu_offline (Failed)
         45 - testcorenrn_patstim::compare_results (Failed)
         99 - testcorenrn_netstimdirect::direct (Failed)
        100 - testcorenrn_netstimdirect::compare_results (Failed)
```

**To Reproduce**
Steps to reproduce the behavior:
```bash
git clone git@github.com:neuronsimulator/nrn.git
cd nrn
mkdir build_unified && cd build_unified
cmake .. -DCMAKE_INSTALL_PREFIX=./install -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_RX3D=OFF -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_TESTS=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_CU
DA_UNIFIED_MEMORY=ON -DCORENRN_ENABLE_OPENMP=OFF
make -j16
ctest --output-on-failure
```

**Expected behavior**
GPU tests should be passing with Unified Memory as well.

**Logs**
An example of a failing test (`coreneuron_modtests::direct_py`) when run with `cuda-memcheck` has the following output:
```
========= Invalid __global__ read of size 8
=========     at 0x00000730 in /gpfs/bbp.cscs.ch/data/scratch/proj16/magkanar/psolve-direct/nrn_gpu/build_unified/test/nrnivmodl/8e220c327f2b8882adcf04884baa4209f37d0bbcef5677f046766f546d969ffd/x86_64/corenrn/mod2c/stim.cpp:410:coreneuron::_nrn_cur__IClamp_370_gpu(coreneuron::NrnThread*, coreneuron::Memb_list*, int)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x05121860 is out of bounds
=========     Device Frame:/gpfs/bbp.cscs.ch/data/scratch/proj16/magkanar/psolve-direct/nrn_gpu/build_unified/test/nrnivmodl/8e220c327f2b8882adcf04884baa4209f37d0bbcef5677f046766f546d969ffd/x86_64/corenrn/mod2c/stim.cpp:410:coreneuron::_nrn_cur__IClamp_370_gpu(coreneuron::NrnThread*, coreneuron::Memb_list*, int) (coreneuron::_nrn_cur__IClamp_370_gpu(coreneuron::NrnThread*, coreneuron::Memb_list*, int) : 0x730)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib64/libcuda.so (cuLaunchKernel + 0x34e) [0x2efa6e]
=========     Host Frame:/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/compilers/lib/libaccdevice.so (__pgi_uacc_cuda_launch3 + 0x1d94) [0x1ca43]
=========     Host Frame:/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/compilers/lib/libaccdevice.so [0x1d7a5]
=========     Host Frame:/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/compilers/lib/libaccdevice.so (__pgi_uacc_cuda_launch + 0x13d) [0x1d8e4]
=========     Host Frame:/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/externals/2021-01-06/linux-rhel7-x86_64/gcc-9.3.0/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/compilers/lib/libacchost.so (__pgi_uacc_launch + 0x1f7) [0x463c0]
=========     Host Frame:./x86_64/special (_ZN10coreneuron16_nrn_cur__IClampEPNS_9NrnThreadEPNS_9Memb_listEi + 0x89b) [0x5702b]
=========     Host Frame:./x86_64/special [0x17f3bb]
=========     Host Frame:./x86_64/special (_ZN10coreneuron25setup_tree_matrix_minimalEPNS_9NrnThreadE + 0xe) [0x1814ae]
Failing in Thread:1
call to cuLaunchKernel returned error 719: Launch failed (often invalid pointer dereference)
```
The corresponding line that fails in the `stim.cpp`:
```
409:      #pragma acc atomic update
410:      _nt->nrn_fast_imem->nrn_sav_rhs[_nd_idx] += _rhs;
411:      #pragma acc atomic update
412:      _nt->nrn_fast_imem->nrn_sav_d[_nd_idx] -= _g;
```

**System (please complete the following information)**
 - OS: RedHat
 - Compiler: NVHPC 21.2
 - Version: master branch
 - Backend: GPU


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NEURON integrated tests failing with CUDA Unified Memory enabled #594

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NEURON integrated tests failing with CUDA Unified Memory enabled #594

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions