Add -DGGML_HIP_NO_VMM=OFF to fix ROCm OOM/segfault on APUs and RDNA 3.5+#94
Open
ElSnacko wants to merge 2 commits into
Open
Add -DGGML_HIP_NO_VMM=OFF to fix ROCm OOM/segfault on APUs and RDNA 3.5+#94ElSnacko wants to merge 2 commits into
ElSnacko wants to merge 2 commits into
Conversation
Without VMM, hipMalloc never releases pages back to the GPU memory pool. On APUs with unified memory (e.g. gfx1103/780M) and RDNA 3.5/4 dGPUs with large VRAM, the allocator permanently holds onto scratch memory from flash-attn and KV cache, eventually OOMing and segfaulting (exit 139). With VMM enabled, llama.cpp uses cuMemCreate/cuMemMap for on-demand allocation and cuMemRelease to return pages. Devices that don't support VMM fall back to the legacy pool automatically at runtime, so this is safe for all GPU targets. Refs: lemonade-sdk#87, lemonade-sdk#79, lemonade-sdk#86, lemonade-sdk#52
HIP VMM is controlled by GGML_HIP_NO_VMM (default ON = disabled). The previous GGML_USE_VMM flag does not exist in llama.cpp's cmake.
Contributor
|
@slojosic-amd Any thoughts here? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
All current builds report
VMM: nobecause llama.cpp defaultsGGML_HIP_NO_VMMtoONinggml/CMakeLists.txt:Without VMM,
hipMallocnever releases GPU memory pages. On unified-memory APUs (gfx1103/780M) and RDNA 3.5/4 dGPUs, the allocator permanently holds flash-attn scratch and KV cache memory, eventually OOMing and segfaulting (exit 139).This is the root cause of #87, #79, #86, #52.
Fix
Add
-DGGML_HIP_NO_VMM=OFFto both the Windows and Ubuntu cmake configure steps. This enables the VMM-backed pool allocator (ggml_cuda_pool_vmm) which useshipMemCreate/hipMemMapfor on-demand allocation andhipMemReleaseto return pages.Note: The correct cmake flag is
GGML_HIP_NO_VMM, notGGML_USE_VMM(which doesn't exist) orGGML_CUDA_NO_VMM(which only applies to CUDA builds). HIP VMM is controlled separately inggml/src/ggml-hip/CMakeLists.txtandggml/src/ggml-cuda/common.cuh.Safety
Devices that don't support VMM fall back to the legacy
hipMallocpool automatically at runtime viahipDeviceAttributeVirtualMemoryManagementSupported. No behavior change for unsupported hardware.Change
Two one-line additions to
.github/workflows/build-llamacpp-rocm.yml:-DGGML_HIP_NO_VMM=OFF ^-DGGML_HIP_NO_VMM=OFF \