add README for AMD ROCm by Apophis3158 · Pull Request #32 · Comfy-Org/comfy-aimdo

Apophis3158 · 2026-04-12T23:08:37Z

DEPRECATED

Motivation

I noticed that 8 out of 16 forks of this repository attempt to add AMD ROCm support. After analyzing their changes, I confirmed that porting aimdo to ROCm primarily requires mapping CUDA driver APIs to their HIP equivalents. However, none of the existing forks support both Linux and Windows simultaneously.

This PR consolidates those community efforts and provides a unified, cross-platform implementation with full CI support for both Linux and Windows ROCm builds.

Changes

Add ROCm/HIP backend via CUDA-to-HIP macro mappings in src/plat_hip.h
Add CI workflow for building aimdo-rocm.so(dll)
Rename aimdo.so(dll) to aimdo-cuda.so(dll) for distinction
Implement automatic backend detection based on torch.version.hip
Update README with Experimental ROCm support matrix and Windows setup recommendations
Add local build scripts build-linux-docker-rocm and build-win-rocm.ps1 for devs
Enable INFO logging for examples/example.py

Impact on ComfyUI

None, I think. This change is fully backward compatible:

On AMD platforms, comfy_aimdo.control.init() will load the ROCm backend library, but aimdo will not be actively used unless explicitly enabled via --enable-dynamic-vram
Existing NVIDIA CUDA functionality remains unchanged
No changes to ComfyUI core or user-facing APIs

Additional requirements

ROCm build on Windows require clang.exe in SDK so I followed your approach made this https://github.com/Apophis3158/comfy-aimdo/releases/tag/v0.0.0
You can copy the necessary components from the ROCm SDK rocm_sdk_core-7.2.1-py3-none-win_amd64.whl yourself:

win-rocm-raw/
├── include/
│   ├── amd_comgr/
│   ├── CL/
│   └── hip/
└── lib/
    ├── amdhip64.lib
    └── llvm/
        ├── bin/
        │   ├── clang.exe
        │   └── lld-link.exe
        └── lib/
        └── clang/
             └── 22/
                  └── lib/
                       └── windows/
                            └── clang_rt.builtins-x86_64.lib

Experimental AMD ROCm support and CI has been achieved by a linkless way, but there are still some things left in the original PR to discuss.

rattus128 · 2026-04-16T04:55:47Z

Thanks for this. Ive just merged a PR to master that is going to conflict. I am now just getting around the AMD project personally. Rather than send you back to square one though with those merge conflicts, feel free to leave this a few days as I will still analyze the approach relative to your merge base. I have a few ideas on how to make this easier esp from a builds point of view. Im going to next couple of days catch up on the history and approach and see where we are at. This is Aimdos next official feature by plans as of this writing.

asagi4 · 2026-04-16T06:25:58Z

I gave this a quick test on Linux and it seems to work fine (I'm not sure what made the detour unnecessary on Linux but apparently there is no infinite loop anymore).

It'll need the mmap workaround from my PR to avoid the memory leak issue in the runtime. It was fixed upstream but the fix isn't in any release yet as far as I am aware

Apophis3158 · 2026-04-16T07:42:30Z

@rattus128 I'm so glad to hear that. Thank you for your amazing work, aimdo has really helped me solve a lot of problems.

Apophis3158 · 2026-04-16T07:49:08Z

@asagi4 Detour is Windows only hooking tool I think, should be unrelated to Linux. And Linux hooks were just added in version hours ago.

So disappointing about memory leak issue fix not released, as disappointed that AMD ignoring some other pytorch issues.

Apophis3158 · 2026-04-16T08:42:04Z


    def __del__(self):
-        if control.lib is not None and hasattr(self, '_ptr') and self._ptr:
+        if lib is not None and hasattr(self, '_ptr') and self._ptr:


Running example.py and got

Iteration 5 ... Exception ignored in: <function ModelVBAR.__del__ at 0x000001E631BC2D40> Traceback (most recent call last): File "H:\ROCm\.venv\Lib\site-packages\comfy_aimdo\model_vbar.py", line 122, in __del__ AttributeError: 'NoneType' object has no attribute 'lib' Exception ignored in: <function ModelVBAR.__del__ at 0x000001E631BC2D40> Traceback (most recent call last): File "H:\ROCm\.venv\Lib\site-packages\comfy_aimdo\model_vbar.py", line 122, in __del__ AttributeError: 'NoneType' object has no attribute 'lib'

so I changed control.lib to lib, but control.lib should also be None, right?

rattus128 · 2026-04-16T16:51:11Z

@asagi4 Detour is Windows only hooking tool I think, should be unrelated to Linux. And Linux hooks were just added in version hours ago.

So disappointing about memory leak issue fix not released, as disappointed that AMD ignoring some other pytorch issues.

Catch me up, are we leaking memory here in Aimdo or is this pytorch AMD side? I dont have an aimdo mem leak on the radar and would be happy to fix if thats proven.

asagi4 · 2026-04-16T20:06:58Z

Catch me up, are we leaking memory here in Aimdo or is this pytorch AMD side? I dont have an aimdo mem leak on the radar and would be happy to fix if thats proven.

the HIP runtime has (had) a bug that prevents memory from being freed properly if there's still any virtual memory range mapped that ever had the memory mapped it it.

It's fixed now, but until it's part of an actual release, you can easily work around it with a manual mmap call. See the latest commit in my PR for the workaround. and
ROCm/ROCm#6021

rattus128 · 2026-04-21T21:55:03Z

I merged, resolved and did some further dev on the @asagi4 PR as #35 . You have some useful README stuff here that we need to figure out. The 7.2.1 from portable README doesn't work for me yet. Im going to give it a bit to see if we get an official update from AMD, but for the moment what you have written in README kinda stands as the best advice.

Apophis3158 · 2026-04-22T03:40:10Z

@rattus128 Thank you for your amazing work!

ROCm 7.2.2 was released last week but not including Windows version XD, and there must be a long wait till next release.

Apophis3158 · 2026-04-22T04:01:54Z

I updated code and let's discuss about VRAM_CHUNK_SIZE:

It's reduced from 16MB to 2MB in 9f2d2fa because of Linux OOM in early time, but as @asagi4 tested this PR's build (which has always been using 16MB) #32 (comment), I don't think VRAM_CHUNK_SIZE is the root cause of the OOM.

Apophis3158 · 2026-04-22T04:04:58Z

 dist/
 *.egg-info/
 .venv/
+comfy_aimdo/__pycache__/


For pip editable installation.

asagi4 · 2026-04-24T07:04:05Z

I think the OOM with the chunk size might've been caused by the same VRAM leak bug. Using a chunk size of 16MB seems to work now on AMD too, though I can't tell if it helps or hurts performance.

Apophis3158 · 2026-04-24T08:31:45Z

I think the OOM with the chunk size might've been caused by the same VRAM leak bug. Using a chunk size of 16MB seems to work now on AMD too, though I can't tell if it helps or hurts performance.

Thanks for confirming! It looks like the OOM was indeed caused by the HIP VRAM leak rather than the chunk size itself. With the unmap workaround in place addressing the root cause, 16MB should be fine on AMD. (With Linux ROCm 7.2.2 there was even 128MB attempting)

Regarding performance — 16MB should actually be faster than 2MB. Each chunk in vrambuf_grow goes through the full cuMemCreate → cuMemMap → cuMemSetAccess chain, so a 2MB chunk size means ~8x more driver calls. For ComfyUI workloads that allocate several GB at a time, the internal fragmentation from 16MB alignment is negligible (at most 14MB wasted, a tiny fraction relative to model size), while the reduced syscall overhead is a clear win.

As I had made a separate PR for Windows' chunk size #40, feel free to share your opinions.

Apophis3158 commented Apr 12, 2026

View reviewed changes

Comment thread .github/workflows/build-wheels.yml Outdated

This comment was marked as outdated.

Sign in to view

Apophis3158 commented Apr 16, 2026

View reviewed changes

Comment thread .github/workflows/build-wheels.yml Outdated

Apophis3158 changed the title ~~feat: Add experimental AMD ROCm support (Linux & Windows)~~ feat: Add experimental AMD ROCm CI (Linux & Windows) Apr 16, 2026

Apophis3158 force-pushed the dev/rocm branch from 5202008 to e4c3e3a Compare April 16, 2026 08:33

This comment was marked as outdated.

Sign in to view

Apophis3158 commented Apr 16, 2026

View reviewed changes

Comment thread src/cuda-hooks-shared.h Outdated

Apophis3158 force-pushed the dev/rocm branch from 51bcdda to a9dcec5 Compare April 22, 2026 03:13

Apophis3158 commented Apr 22, 2026

View reviewed changes

Apophis3158 force-pushed the dev/rocm branch from a9dcec5 to f450305 Compare April 24, 2026 04:21

update README for AMD

19eec06

Apophis3158 force-pushed the dev/rocm branch from f450305 to 19eec06 Compare May 14, 2026 22:13

Apophis3158 changed the title ~~feat: Add experimental AMD ROCm CI (Linux & Windows)~~ add README for AMD ROCm May 14, 2026

Conversation

Apophis3158 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Impact on ComfyUI

Additional requirements

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

rattus128 commented Apr 16, 2026

Uh oh!

asagi4 commented Apr 16, 2026

Uh oh!

Apophis3158 commented Apr 16, 2026

Uh oh!

Apophis3158 commented Apr 16, 2026

Uh oh!

This comment was marked as outdated.

Apophis3158 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rattus128 commented Apr 16, 2026

Uh oh!

asagi4 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Apr 21, 2026

Uh oh!

Apophis3158 commented Apr 22, 2026

Uh oh!

Apophis3158 commented Apr 22, 2026

Uh oh!

Apophis3158 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

asagi4 commented Apr 24, 2026

Uh oh!

Apophis3158 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Apophis3158 commented Apr 12, 2026 •

edited

Loading

asagi4 commented Apr 16, 2026 •

edited

Loading