Skip to content

modified: src/force/neighbor.cu#1577

Merged
brucefan1983 merged 4 commits into
brucefan1983:masterfrom
YixinDeng:fix-array-overflow
Jun 28, 2026
Merged

modified: src/force/neighbor.cu#1577
brucefan1983 merged 4 commits into
brucefan1983:masterfrom
YixinDeng:fix-array-overflow

Conversation

@YixinDeng

@YixinDeng YixinDeng commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary
This PR adds local size_t casts to large NEP array size calculations and flattened neighbor-list/descriptor offsets that can exceed 32-bit integer range in large-system GPU runs.

Modification

  • Cast large GPU_Vector::resize() products to size_t in NEP::NEP():

    • nep_data.NL_radial
    • nep_data.Fp
    • nep_data.sum_fxyz
  • Cast per-GPU GPU_Vector::resize() products to size_t in NEP_MULTIGPU::allocate_memory():

    • nep_data[gpu].NL_radial
    • nep_data[gpu].Fp
    • nep_data[gpu].sum_fxyz
  • Cast flattened NEP large-box offsets to size_t for:

    • global neighbor-list reads before building NEP radial lists
    • NL_radial writes and reads
    • Fp writes and reads
    • sum_fxyz writes and reads
  • Cast the global neighbor-list allocation size in Neighbor::initialize().

  • Cast flattened global/local neighbor-list offsets in non-ILP global neighbor-list kernels.

  • Cast flattened read/write offsets in the ordinary neighbor-list sort kernel in neighbor.cuh.

Validation
A100_nve_new_old_speed_memory
2V100_nve_new_old_speed_memory

The x-axis is the number of atoms, shown on a logarithmic scale. The speed panel reports throughput in units of 10^7 atom step s^-1. The memory panel reports peak GPU memory used in GiB. Blue circles represent the old version; orange squares represent the new version. Dashed vertical lines in the memory panel mark the first failed point for each version. The A100 speed panel includes an inset that zooms in on the high-atom-count region of the new version.

Compared with the old version, the new version reaches about 3.62x more atoms on A100 and about 1.47x more atoms on 2V100. The corresponding peak GPU memory usage increases from 22.50 to 77.69 GiB on A100, and from 39.54 to 57.22 GiB on 2V100.

	modified:   src/force/neighbor.cuh
	modified:   src/force/nep.cu
	modified:   src/force/nep_small_box.cuh
	modified:   src/force/nep_small_box.cuh
@YixinDeng YixinDeng marked this pull request as ready for review June 27, 2026 15:52
YixinDeng and others added 2 commits June 28, 2026 14:02
	modified:   src/force/neighbor.cuh
	modified:   src/force/nep.cu
	modified:   src/force/nep_multigpu.cu
	modified:   src/force/nep_small_box.cuh
@brucefan1983 brucefan1983 merged commit c4ab0e3 into brucefan1983:master Jun 28, 2026
2 checks passed
@YixinDeng YixinDeng deleted the fix-array-overflow branch June 29, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants