modified: src/force/neighbor.cu by YixinDeng · Pull Request #1577 · brucefan1983/GPUMD

YixinDeng · 2026-06-27T15:43:30Z

Summary
This PR adds local size_t casts to large NEP array size calculations and flattened neighbor-list/descriptor offsets that can exceed 32-bit integer range in large-system GPU runs.

Modification

Cast large GPU_Vector::resize() products to size_t in NEP::NEP():
- nep_data.NL_radial
- nep_data.Fp
- nep_data.sum_fxyz
Cast per-GPU GPU_Vector::resize() products to size_t in NEP_MULTIGPU::allocate_memory():
- nep_data[gpu].NL_radial
- nep_data[gpu].Fp
- nep_data[gpu].sum_fxyz
Cast flattened NEP large-box offsets to size_t for:
- global neighbor-list reads before building NEP radial lists
- NL_radial writes and reads
- Fp writes and reads
- sum_fxyz writes and reads
Cast the global neighbor-list allocation size in Neighbor::initialize().
Cast flattened global/local neighbor-list offsets in non-ILP global neighbor-list kernels.
Cast flattened read/write offsets in the ordinary neighbor-list sort kernel in neighbor.cuh.

Validation

The x-axis is the number of atoms, shown on a logarithmic scale. The speed panel reports throughput in units of 10^7 atom step s^-1. The memory panel reports peak GPU memory used in GiB. Blue circles represent the old version; orange squares represent the new version. Dashed vertical lines in the memory panel mark the first failed point for each version. The A100 speed panel includes an inset that zooms in on the high-atom-count region of the new version.

Compared with the old version, the new version reaches about 3.62x more atoms on A100 and about 1.47x more atoms on 2V100. The corresponding peak GPU memory usage increases from 22.50 to 77.69 GiB on A100, and from 39.54 to 57.22 GiB on 2V100.

modified: src/force/neighbor.cuh modified: src/force/nep.cu modified: src/force/nep_small_box.cuh

modified: src/force/nep_small_box.cuh

modified: src/force/neighbor.cuh modified: src/force/nep.cu modified: src/force/nep_multigpu.cu modified: src/force/nep_small_box.cuh

YixinDeng added 2 commits June 27, 2026 23:18

modified: src/force/neighbor.cu

78c1160

modified: src/force/neighbor.cuh modified: src/force/nep.cu modified: src/force/nep_small_box.cuh

modified: src/force/nep.cu

9b8e49a

modified: src/force/nep_small_box.cuh

YixinDeng marked this pull request as ready for review June 27, 2026 15:52

YixinDeng and others added 2 commits June 28, 2026 14:02

modified: src/force/neighbor.cu

95b4b36

modified: src/force/neighbor.cuh modified: src/force/nep.cu modified: src/force/nep_multigpu.cu modified: src/force/nep_small_box.cuh

Merge branch 'brucefan1983:master' into fix-array-overflow

f4d3b2e

brucefan1983 merged commit c4ab0e3 into brucefan1983:master Jun 28, 2026
2 checks passed

YixinDeng deleted the fix-array-overflow branch June 29, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modified: src/force/neighbor.cu#1577

modified: src/force/neighbor.cu#1577
brucefan1983 merged 4 commits into
brucefan1983:masterfrom
YixinDeng:fix-array-overflow

YixinDeng commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YixinDeng commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YixinDeng commented Jun 27, 2026 •

edited

Loading