Skip to content

Add -DGGML_HIP_NO_VMM=OFF to fix ROCm OOM/segfault on APUs and RDNA 3.5+#94

Open
ElSnacko wants to merge 2 commits into
lemonade-sdk:mainfrom
ElSnacko:fix/add-vmm-flag
Open

Add -DGGML_HIP_NO_VMM=OFF to fix ROCm OOM/segfault on APUs and RDNA 3.5+#94
ElSnacko wants to merge 2 commits into
lemonade-sdk:mainfrom
ElSnacko:fix/add-vmm-flag

Conversation

@ElSnacko

@ElSnacko ElSnacko commented May 3, 2026

Copy link
Copy Markdown

Problem

All current builds report VMM: no because llama.cpp defaults GGML_HIP_NO_VMM to ON in ggml/CMakeLists.txt:

option(GGML_HIP_NO_VMM "ggml: do not try to use HIP VMM" ON)

Without VMM, hipMalloc never releases GPU memory pages. On unified-memory APUs (gfx1103/780M) and RDNA 3.5/4 dGPUs, the allocator permanently holds flash-attn scratch and KV cache memory, eventually OOMing and segfaulting (exit 139).

This is the root cause of #87, #79, #86, #52.

Fix

Add -DGGML_HIP_NO_VMM=OFF to both the Windows and Ubuntu cmake configure steps. This enables the VMM-backed pool allocator (ggml_cuda_pool_vmm) which uses hipMemCreate/hipMemMap for on-demand allocation and hipMemRelease to return pages.

Note: The correct cmake flag is GGML_HIP_NO_VMM, not GGML_USE_VMM (which doesn't exist) or GGML_CUDA_NO_VMM (which only applies to CUDA builds). HIP VMM is controlled separately in ggml/src/ggml-hip/CMakeLists.txt and ggml/src/ggml-cuda/common.cuh.

Safety

Devices that don't support VMM fall back to the legacy hipMalloc pool automatically at runtime via hipDeviceAttributeVirtualMemoryManagementSupported. No behavior change for unsupported hardware.

Change

Two one-line additions to .github/workflows/build-llamacpp-rocm.yml:

  • Windows cmake block: -DGGML_HIP_NO_VMM=OFF ^
  • Ubuntu cmake block: -DGGML_HIP_NO_VMM=OFF \

ElSnacko added 2 commits May 3, 2026 00:33
Without VMM, hipMalloc never releases pages back to the GPU memory
pool. On APUs with unified memory (e.g. gfx1103/780M) and RDNA 3.5/4
dGPUs with large VRAM, the allocator permanently holds onto scratch
memory from flash-attn and KV cache, eventually OOMing and segfaulting
(exit 139).

With VMM enabled, llama.cpp uses cuMemCreate/cuMemMap for on-demand
allocation and cuMemRelease to return pages. Devices that don't support
VMM fall back to the legacy pool automatically at runtime, so this is
safe for all GPU targets.

Refs: lemonade-sdk#87, lemonade-sdk#79, lemonade-sdk#86, lemonade-sdk#52
HIP VMM is controlled by GGML_HIP_NO_VMM (default ON = disabled).
The previous GGML_USE_VMM flag does not exist in llama.cpp's cmake.
@ElSnacko ElSnacko changed the title Add -DGGML_USE_VMM=ON to fix ROCm OOM/segfault on APUs and RDNA 3.5+ Add -DGGML_HIP_NO_VMM=OFF to fix ROCm OOM/segfault on APUs and RDNA 3.5+ May 3, 2026
@danielholanda

Copy link
Copy Markdown
Contributor

@slojosic-amd Any thoughts here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants