Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not load shared library: stable-diffusion_hipblas.dll #29

Open
gloowa opened this issue Jan 7, 2025 · 9 comments
Open

Can not load shared library: stable-diffusion_hipblas.dll #29

gloowa opened this issue Jan 7, 2025 · 9 comments
Assignees

Comments

@gloowa
Copy link

gloowa commented Jan 7, 2025

Hi.

1st of all: very promising project. Not having to deal with npm to use SD will be a godsend.

I do seem to have an issue with generating anything tho. When queueing a job, I immediatly get "Generation error: External process stopped". The diffuser.log has "[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll" message in it but not any more details.

The dll is present at that path. Tried running as admin, no change. I have Radeon RX6800. Running Win10.

Not sure what might be the issue.

[edit]
I've also tried running the GUI with -avx512, it starts generation and is able to do the "model hash" step, but then fails as well with "Generation error: External process stopped." Error, however the diffuser.log is empty in that case.

@fszontagh
Copy link
Owner

Hi there!
can you please attach the whole log file?

@fszontagh fszontagh self-assigned this Jan 7, 2025
@gloowa
Copy link
Author

gloowa commented Jan 7, 2025

In-app console:

[2025-01-07 12:49:36]: StableDiffusionGUI 0.2.3 ad11faf started
[2025-01-07 12:49:36]: Loaded PRESETS: 0
[2025-01-07 12:49:36]: Loaded PROMPT_TEMPLATES: 0
[2025-01-07 12:49:36]: Loaded CHECKPOINT: 5
[2025-01-07 12:49:36]: Loaded LORA: 16
[2025-01-07 12:49:36]: Loaded VAE: 0
[2025-01-07 12:49:36]: Loaded TAESD: 0
[2025-01-07 12:49:36]: Loaded CONTROLNET: 0
[2025-01-07 12:49:36]: Loaded ESRGAN: 0
[2025-01-07 12:49:36]: Loaded EMBEDDING: 0
[2025-01-07 12:49:36]: Starting external process: D:\AI\stablediffusion\sdgui\stablediffusiongui_diffuser.exe D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[2025-01-07 12:49:52]: Generation error: External process stopped 
[2025-01-07 12:49:57]: Generation error: External process stopped 
[2025-01-07 12:50:02]: Generation error: External process stopped 
[2025-01-07 12:50:07]: Generation error: External process stopped 
[2025-01-07 12:50:12]: Generation error: External process stopped 

stablediffusiongui_diffuser.log:

[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll

[edit]
This is from running one generation. Not sure why it logs the error so many times... 1 per thread?

@fszontagh
Copy link
Owner

fszontagh commented Jan 7, 2025

This is from running one generation. Not sure why it logs the error so many times... 1 per thread?

No, it is just trying to restart periodically the external process (after in every 5 seconds). If an error happened, then the GUI not making differents between errors, it is just trying to restart the process again. UPDATE: it is not relevant here. If you stop all queue job, then the process restarting again and again?

As i see, it is logged into multiple places, so that's can cause duplicates.

This path 'D:\AI\stablediffusion\sdgui' it was set in the installer itself when you installed? Or you moved the installed folder into this new place?

Please try to reinstall and use the default path what is in the installer. I don't think this is the problem, but it's worth a try.

Please check out the task manager, if you have a running stablediffusion* process. If yes, please stop it all.

Then please start the GUI with the following parameter:
-disable-external-process-handling

This will disable the external process handling, so the process will not start. You need to start manually with the following cmd:

D:\AI\stablediffusion\sdgui\stablediffusiongui_diffuser.exe D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll

May escaping required here.

If you started, then please check out the output (if have) and the stablediffusiongui_diffuser.log. And there is an app.log in the "Data folder"

@gloowa
Copy link
Author

gloowa commented Jan 7, 2025

The path was set in installer, I keep most software out of C drive.

There was no process in task manager.

I've tried running with the -disable-external-process-handling flag. It queued the job, but trying to run the actual diffuser was unseccessful.

cmd Output:

D:\AI\stablediffusion\sdgui>stablediffusiongui_diffuser.exe D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] starting with shared memory size: 16777216
Using tmp path: C:\Users\gloow\AppData\Local\Temp
Failed to load shared library: Failed to load library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[EXTPROCESS] Can not load shared library

The app.log in the appdata (i missed that location) contains nothing more than what I already pasted:

[2025-01-07 12:49:36]: StableDiffusionGUI 0.2.3 ad11faf started
[2025-01-07 12:49:36]: Loaded PRESETS: 0
[2025-01-07 12:49:36]: Loaded PROMPT_TEMPLATES: 0
[2025-01-07 12:49:36]: Loaded CHECKPOINT: 5
[2025-01-07 12:49:36]: Loaded LORA: 16
[2025-01-07 12:49:36]: Loaded VAE: 0
[2025-01-07 12:49:36]: Loaded TAESD: 0
[2025-01-07 12:49:36]: Loaded CONTROLNET: 0
[2025-01-07 12:49:36]: Loaded ESRGAN: 0
[2025-01-07 12:49:36]: Loaded EMBEDDING: 0
[2025-01-07 12:49:36]: Starting external process: D:\AI\stablediffusion\sdgui\stablediffusiongui_diffuser.exe D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
[2025-01-07 12:49:52]: Generation error: External process stopped 
[2025-01-07 12:49:57]: Generation error: External process stopped 
[2025-01-07 12:50:02]: Generation error: External process stopped 
[2025-01-07 12:50:07]: Generation error: External process stopped 
[2025-01-07 12:50:12]: Generation error: External process stopped 
[2025-01-07 12:50:17]: Generation error: External process stopped 
[2025-01-07 12:50:22]: Generation error: External process stopped 
[2025-01-07 12:50:27]: Generation error: External process stopped 
[2025-01-07 12:50:32]: Generation error: External process stopped 
[2025-01-07 12:50:37]: Generation error: External process stopped 
[2025-01-07 12:50:42]: Generation error: External process stopped 
[2025-01-07 12:50:47]: Generation error: External process stopped 
[2025-01-07 12:50:52]: Generation error: External process stopped 
[2025-01-07 12:50:57]: Generation error: External process stopped 
[2025-01-07 12:51:02]: Generation error: External process stopped 
[2025-01-07 12:51:07]: Generation error: External process stopped 
[2025-01-07 12:51:12]: Generation error: External process stopped 
[2025-01-07 12:51:12]: [EXTPROCESS] starting with shared memory size: 16777216
 
[2025-01-07 12:51:12]: Using tmp path: C:\Users\<my_user>\AppData\Local\Temp
 
[2025-01-07 12:51:12]: Failed to load shared library: Failed to load library: D:\AI\stablediffusion\sdgui\stable-diffusion_hipblas.dll
 
[2025-01-07 12:51:12]: [EXTPROCESS] Can not load shared library

I will try to reinstall into default location shortly and let you know if that changes anything.

[update]
Uninstalled the existing installation, deleted the appdata directory to clean any leftovers. Installed again in default location, same result unfortunately.

[2025-01-07 13:51:47]: StableDiffusionGUI 0.2.3 ad11faf started
[2025-01-07 13:51:47]: Loaded PRESETS: 0
[2025-01-07 13:51:47]: Loaded PROMPT_TEMPLATES: 0
[2025-01-07 13:51:47]: Loaded CHECKPOINT: 5
[2025-01-07 13:51:47]: Loaded LORA: 16
[2025-01-07 13:51:47]: Loaded VAE: 0
[2025-01-07 13:51:47]: Loaded TAESD: 0
[2025-01-07 13:51:47]: Loaded CONTROLNET: 0
[2025-01-07 13:51:47]: Loaded ESRGAN: 0
[2025-01-07 13:51:47]: Loaded EMBEDDING: 0
[2025-01-07 13:51:47]: Starting external process: C:\Program Files\StableDiffusionGUI 0.2.3\stablediffusiongui_diffuser.exe C:\Program Files\StableDiffusionGUI 0.2.3\stable-diffusion_hipblas.dll
[2025-01-07 13:51:58]: [EXTPROCESS] starting with shared memory size: 16777216
 
[2025-01-07 13:51:58]: Using tmp path: C:\Users\<my_user>\AppData\Local\Temp
 
[2025-01-07 13:52:08]: Generation error: External process stopped 
[2025-01-07 13:52:13]: Generation error: External process stopped 
[2025-01-07 13:52:18]: Generation error: External process stopped 
[2025-01-07 13:52:23]: Generation error: External process stopped 
[2025-01-07 13:52:28]: Generation error: External process stopped 
[2025-01-07 13:52:33]: Generation error: External process stopped 
[2025-01-07 13:52:38]: Generation error: External process stopped 
[2025-01-07 13:52:43]: Generation error: External process stopped 
[2025-01-07 13:52:48]: Generation error: External process stopped 
[2025-01-07 13:52:53]: Generation error: External process stopped 
[2025-01-07 13:52:58]: Generation error: External process stopped 
[2025-01-07 13:53:03]: Generation error: External process stopped 
[2025-01-07 13:53:08]: Generation error: External process stopped 
[2025-01-07 13:53:13]: Generation error: External process stopped 
[2025-01-07 13:53:18]: Generation error: External process stopped 
[2025-01-07 13:53:23]: Generation error: External process stopped 
[2025-01-07 13:53:28]: StableDiffusionGUI 0.2.3 ad11faf exited

[update2]
I can get things to run using avx2 (I thought my cpu had avx-512 but it does not, that's why runnign avx-512 crashed).

So maybe the problem is not that the _hipblas.dll does not load but that it crashes on init somehow? But my GPU should be able to run with HIP and ROCm. Koboldcpp runs fine with it using rocm build, offloading the computation to GPU.

@fszontagh
Copy link
Owner

It's weird. Not the GPU the problem.

The original error is comming from here, which means, the lib can not be loaded because something missing for it. (maybe some runtime lib missing on win10 what is available on 11, but i have no proof if it can run on win11 too :( )
Sadly i can't test hipblas because i have no resources for it.

So, today i want to release a new version, where i replaced hipblas with vulkan binaries. Maybe the next release will work for you.

@gloowa
Copy link
Author

gloowa commented Jan 7, 2025

maybe some runtime lib missing on win10 what is available on 11, but i have no proof if it can run on win11 too :( ) Sadly i can't test hipblas because i have no resources for it.

I think I've found something. I've investigated, using the ldd command from Git Bash and the output is enlightening:

$ ldd stable-diffusion_hipblas.dll
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffed3cf0000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffed1ee0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffed1390000)
        msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffed3ab0000)
        stable-diffusion_hipblas.dll => /d/AI/stablediffusion/sdgu/stable-diffusion_hipblas.dll (0x7ffe22680000)
        ucrtbase.dll => /c/Windows/System32/ucrtbase.dll (0x7ffed1690000)
        libomp140.x86_64.dll => /c/Windows/System32/libomp140.x86_64.dll (0x7ffea7cc0000)
        msvcp140_codecvt_ids.dll => /c/Windows/System32/msvcp140_codecvt_ids.dll (0x7ffec3880000)
        amdhip64.dll => /c/Windows/System32/amdhip64.dll (0x7ffe3c9b0000)
        msvcp140.dll => /c/Windows/System32/msvcp140.dll (0x7ffea8200000)
        psapi.dll => /c/Windows/System32/psapi.dll (0x7ffed3330000)
        setupapi.dll => /c/Windows/System32/setupapi.dll (0x7ffed2960000)
        vcruntime140.dll => /c/Windows/System32/vcruntime140.dll (0x7ffebbdd0000)
        cfgmgr32.dll => /c/Windows/System32/cfgmgr32.dll (0x7ffed1890000)
        rpcrt4.dll => /c/Windows/System32/rpcrt4.dll (0x7ffed2310000)
        bcrypt.dll => /c/Windows/System32/bcrypt.dll (0x7ffed1860000)
        user32.dll => /c/Windows/System32/user32.dll (0x7ffed30e0000)
        vcruntime140_1.dll => /c/Windows/System32/vcruntime140_1.dll (0x7ffecb3a0000)
        win32u.dll => /c/Windows/System32/win32u.dll (0x7ffed1830000)
        gdi32.dll => /c/Windows/System32/gdi32.dll (0x7ffed2730000)
        gdi32full.dll => /c/Windows/System32/gdi32full.dll (0x7ffed1bf0000)
        msvcp_win.dll => /c/Windows/System32/msvcp_win.dll (0x7ffed1790000)
        advapi32.dll => /c/Windows/System32/advapi32.dll (0x7ffed2440000)
        sechost.dll => /c/Windows/System32/sechost.dll (0x7ffed2690000)
        ws2_32.dll => /c/Windows/System32/ws2_32.dll (0x7ffed27c0000)
        rocblas.dll => not found
        hipblas.dll => not found
        (...)

So rocblas.dll and hipblas.dll are missing. But I installed HIP SDK from AMD, do I need something more? Not sure what provides those.

[update]
So I figured it out. The binaries from AMD were not in path. The AMD installer created a HIP_PATH environment variable, but nothing was added to path. I've added %HIP_PATH%\bin to path and now stablediffusiongui_diffuser.exe stable-diffusion_hipblas.dll actually starts, picks up a queued job, loads the model and tries to generate, however it still fails... with CUDA error?

D:\AI\stablediffusion\sdgu>stablediffusiongui_diffuser.exe stable-diffusion_hipblas.dll
[EXTPROCESS] starting with shared memory size: 16777216
Using tmp path: C:\Users\gloow\AppData\Local\Temp
[EXTPROCESS] New message: 1736285377
[EXTPROCESS] Processing item: 1736285377
Loading sd model: D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors
Model load required for new item
[EXTPROCESS] Loading model: D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors
stable-diffusion.cpp:163  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6800, compute capability 10.3, VMM: no
stable-diffusion.cpp:195  - loading model from 'D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors'
model.cpp:887  - load D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors using safetensors format
model.cpp:957  - init from 'D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors'
stable-diffusion.cpp:242  - Version: SD 1.x
stable-diffusion.cpp:273  - Weight type:                 f16
stable-diffusion.cpp:274  - Conditioner weight type:     f16
stable-diffusion.cpp:275  - Diffusion model weight type: f16
stable-diffusion.cpp:276  - VAE weight type:             f16
stable-diffusion.cpp:278  - ggml tensor size = 400 bytes
clip.hpp:171  - vocab size: 49408
clip.hpp:182  -  trigger word img already in vocab
ggml_extend.hpp:1055 - clip params backend buffer size =  235.06 MB(VRAM) (196 tensors)
ggml_extend.hpp:1055 - unet params backend buffer size =  1640.25 MB(VRAM) (686 tensors)
ggml_extend.hpp:1055 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
stable-diffusion.cpp:414  - loading weights
model.cpp:1653 - loading tensors from D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors
stable-diffusion.cpp:513  - total params memory size = 1969.78MB (VRAM 1969.78MB, RAM 0.00MB): clip 235.06MB(VRAM), unet 1640.25MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:517  - loading model from 'D:\AI\models\stablediffusion\sd\dreamshaper_8.safetensors' completed, taking 2.05s
stable-diffusion.cpp:544  - running in eps-prediction mode
stable-diffusion.cpp:588  - finished loaded file
[EXTPROCESS] Starting item: 1736285377 type: txt2img
[EXTPROCESS] Running txt2img
stable-diffusion.cpp:1460 - txt2img 512x512
stable-diffusion.cpp:1192 - prompt after extract and remove lora: "a red car"
stable-diffusion.cpp:671  - Attempting to apply 0 LoRAs
stable-diffusion.cpp:1197 - apply_loras completed, taking 0.00s
conditioner.hpp:330  - parse 'a red car' to [['a red car', 1], ]
clip.hpp:311  - token length: 77
ggml_extend.hpp:1006 - clip compute buffer size: 1.40 MB(VRAM)
CUDA error: CUBLAS_STATUS_INVALID_VALUE
  current device: 0, in function ggml_cuda_op_mul_mat_cublas at D:/a/sd.cpp.gui.wx/sd.cpp.gui.wx/build/stable_diffusion_cpp_hipblas-prefix/src/stable_diffusion_cpp_hipblas/ggml/src/ggml-cuda.cu:1256
  hipblasSetStream(ctx.cublas_handle(id), stream)
D:/a/sd.cpp.gui.wx/sd.cpp.gui.wx/build/stable_diffusion_cpp_hipblas-prefix/src/stable_diffusion_cpp_hipblas/ggml/src/ggml-cuda.cu:102: CUDA error

I've googled and CUBLAS_STATUS_INVALID_VALUE shows up when there is a version mismatch apparently, but I am way out of my depth here. Why is it even CUDA, shouldn't it use ROCm?

In any case, thank you for your help so far. I really like what you are building here. Any ideas how to solve the latest issue? Did I install wrong HIP SDK version?

@fszontagh
Copy link
Owner

Nice catch!

As i wrote earlier i cant test ROCM and HIPBLAS builds, but what i know is the ROCM version which is 5.5.0. At least the _hipbals.dll is built with this version.

@gloowa
Copy link
Author

gloowa commented Jan 8, 2025

That was it. Installed HIP SDK 5.5.1, updated path, and now it works like a charm.

Maybe a Wiki or readme entry for the HIP SDK version requirement would be nice to add.

In any case, thanks for your support, now it works wonderfully! <3

@fszontagh
Copy link
Owner

That was it. Installed HIP SDK 5.5.1, updated path, and now it works like a charm.

Maybe a Wiki or readme entry for the HIP SDK version requirement would be nice to add.

In any case, thanks for your support, now it works wonderfully! <3

Now in the new release there is no more ROCM / HIP.
Please check out the VULKAN backend too, maybe faster.
You can install the new version alongside the old one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants