Skip to content

add README for AMD ROCm#32

Open
Apophis3158 wants to merge 1 commit into
Comfy-Org:masterfrom
Apophis3158:dev/rocm
Open

add README for AMD ROCm#32
Apophis3158 wants to merge 1 commit into
Comfy-Org:masterfrom
Apophis3158:dev/rocm

Conversation

@Apophis3158
Copy link
Copy Markdown
Contributor

@Apophis3158 Apophis3158 commented Apr 12, 2026

DEPRECATED

Motivation

I noticed that 8 out of 16 forks of this repository attempt to add AMD ROCm support. After analyzing their changes, I confirmed that porting aimdo to ROCm primarily requires mapping CUDA driver APIs to their HIP equivalents. However, none of the existing forks support both Linux and Windows simultaneously.

This PR consolidates those community efforts and provides a unified, cross-platform implementation with full CI support for both Linux and Windows ROCm builds.

Changes

  • Add ROCm/HIP backend via CUDA-to-HIP macro mappings in src/plat_hip.h
  • Add CI workflow for building aimdo-rocm.so(dll)
  • Rename aimdo.so(dll) to aimdo-cuda.so(dll) for distinction
  • Implement automatic backend detection based on torch.version.hip
  • Update README with Experimental ROCm support matrix and Windows setup recommendations
  • Add local build scripts build-linux-docker-rocm and build-win-rocm.ps1 for devs
  • Enable INFO logging for examples/example.py

Impact on ComfyUI

None, I think. This change is fully backward compatible:

  • On AMD platforms, comfy_aimdo.control.init() will load the ROCm backend library, but aimdo will not be actively used unless explicitly enabled via --enable-dynamic-vram
  • Existing NVIDIA CUDA functionality remains unchanged
  • No changes to ComfyUI core or user-facing APIs

Additional requirements

ROCm build on Windows require clang.exe in SDK so I followed your approach made this https://github.com/Apophis3158/comfy-aimdo/releases/tag/v0.0.0
You can copy the necessary components from the ROCm SDK rocm_sdk_core-7.2.1-py3-none-win_amd64.whl yourself:

win-rocm-raw/
├── include/
│   ├── amd_comgr/
│   ├── CL/
│   └── hip/
└── lib/
    ├── amdhip64.lib
    └── llvm/
        ├── bin/
        │   ├── clang.exe
        │   └── lld-link.exe
        └── lib/
        └── clang/
             └── 22/
                  └── lib/
                       └── windows/
                            └── clang_rt.builtins-x86_64.lib

Experimental AMD ROCm support and CI has been achieved by a linkless way, but there are still some things left in the original PR to discuss.

Comment thread .github/workflows/build-wheels.yml Outdated
@Apophis3158

This comment was marked as outdated.

Comment thread .github/workflows/build-wheels.yml Outdated
@Apophis3158 Apophis3158 changed the title feat: Add experimental AMD ROCm support (Linux & Windows) feat: Add experimental AMD ROCm CI (Linux & Windows) Apr 16, 2026
@rattus128
Copy link
Copy Markdown
Collaborator

Thanks for this. Ive just merged a PR to master that is going to conflict. I am now just getting around the AMD project personally. Rather than send you back to square one though with those merge conflicts, feel free to leave this a few days as I will still analyze the approach relative to your merge base. I have a few ideas on how to make this easier esp from a builds point of view. Im going to next couple of days catch up on the history and approach and see where we are at. This is Aimdos next official feature by plans as of this writing.

@asagi4
Copy link
Copy Markdown
Contributor

asagi4 commented Apr 16, 2026

I gave this a quick test on Linux and it seems to work fine (I'm not sure what made the detour unnecessary on Linux but apparently there is no infinite loop anymore).

It'll need the mmap workaround from my PR to avoid the memory leak issue in the runtime. It was fixed upstream but the fix isn't in any release yet as far as I am aware

@Apophis3158
Copy link
Copy Markdown
Contributor Author

@rattus128 I'm so glad to hear that. Thank you for your amazing work, aimdo has really helped me solve a lot of problems.

@Apophis3158
Copy link
Copy Markdown
Contributor Author

@asagi4 Detour is Windows only hooking tool I think, should be unrelated to Linux. And Linux hooks were just added in version hours ago.

So disappointing about memory leak issue fix not released, as disappointed that AMD ignoring some other pytorch issues.

@Apophis3158

This comment was marked as outdated.

Comment thread comfy_aimdo/model_vbar.py Outdated

def __del__(self):
if control.lib is not None and hasattr(self, '_ptr') and self._ptr:
if lib is not None and hasattr(self, '_ptr') and self._ptr:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running example.py and got

Iteration 5
...
Exception ignored in: <function ModelVBAR.__del__ at 0x000001E631BC2D40>
Traceback (most recent call last):
  File "H:\ROCm\.venv\Lib\site-packages\comfy_aimdo\model_vbar.py", line 122, in __del__
AttributeError: 'NoneType' object has no attribute 'lib'
Exception ignored in: <function ModelVBAR.__del__ at 0x000001E631BC2D40>
Traceback (most recent call last):
  File "H:\ROCm\.venv\Lib\site-packages\comfy_aimdo\model_vbar.py", line 122, in __del__
AttributeError: 'NoneType' object has no attribute 'lib'

so I changed control.lib to lib, but control.lib should also be None, right?

Comment thread src/cuda-hooks-shared.h Outdated
@rattus128
Copy link
Copy Markdown
Collaborator

@asagi4 Detour is Windows only hooking tool I think, should be unrelated to Linux. And Linux hooks were just added in version hours ago.

So disappointing about memory leak issue fix not released, as disappointed that AMD ignoring some other pytorch issues.

Catch me up, are we leaking memory here in Aimdo or is this pytorch AMD side? I dont have an aimdo mem leak on the radar and would be happy to fix if thats proven.

@asagi4
Copy link
Copy Markdown
Contributor

asagi4 commented Apr 16, 2026

Catch me up, are we leaking memory here in Aimdo or is this pytorch AMD side? I dont have an aimdo mem leak on the radar and would be happy to fix if thats proven.

the HIP runtime has (had) a bug that prevents memory from being freed properly if there's still any virtual memory range mapped that ever had the memory mapped it it.

It's fixed now, but until it's part of an actual release, you can easily work around it with a manual mmap call. See the latest commit in my PR for the workaround. and
ROCm/ROCm#6021

@rattus128
Copy link
Copy Markdown
Collaborator

I merged, resolved and did some further dev on the @asagi4 PR as #35 . You have some useful README stuff here that we need to figure out. The 7.2.1 from portable README doesn't work for me yet. Im going to give it a bit to see if we get an official update from AMD, but for the moment what you have written in README kinda stands as the best advice.

@Apophis3158
Copy link
Copy Markdown
Contributor Author

@rattus128 Thank you for your amazing work!

ROCm 7.2.2 was released last week but not including Windows version XD, and there must be a long wait till next release.

@Apophis3158
Copy link
Copy Markdown
Contributor Author

I updated code and let's discuss about VRAM_CHUNK_SIZE:

It's reduced from 16MB to 2MB in 9f2d2fa because of Linux OOM in early time, but as @asagi4 tested this PR's build (which has always been using 16MB) #32 (comment), I don't think VRAM_CHUNK_SIZE is the root cause of the OOM.

Comment thread .gitignore Outdated
dist/
*.egg-info/
.venv/
comfy_aimdo/__pycache__/
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pip editable installation.

@asagi4
Copy link
Copy Markdown
Contributor

asagi4 commented Apr 24, 2026

I think the OOM with the chunk size might've been caused by the same VRAM leak bug. Using a chunk size of 16MB seems to work now on AMD too, though I can't tell if it helps or hurts performance.

@Apophis3158
Copy link
Copy Markdown
Contributor Author

I think the OOM with the chunk size might've been caused by the same VRAM leak bug. Using a chunk size of 16MB seems to work now on AMD too, though I can't tell if it helps or hurts performance.

Thanks for confirming! It looks like the OOM was indeed caused by the HIP VRAM leak rather than the chunk size itself. With the unmap workaround in place addressing the root cause, 16MB should be fine on AMD. (With Linux ROCm 7.2.2 there was even 128MB attempting)

Regarding performance — 16MB should actually be faster than 2MB. Each chunk in vrambuf_grow goes through the full cuMemCreatecuMemMapcuMemSetAccess chain, so a 2MB chunk size means ~8x more driver calls. For ComfyUI workloads that allocate several GB at a time, the internal fragmentation from 16MB alignment is negligible (at most 14MB wasted, a tiny fraction relative to model size), while the reduced syscall overhead is a clear win.

As I had made a separate PR for Windows' chunk size #40, feel free to share your opinions.

@Apophis3158 Apophis3158 changed the title feat: Add experimental AMD ROCm CI (Linux & Windows) add README for AMD ROCm May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants