Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm Setup failed despite ROCm being available #1508

Closed
postcanonical opened this issue Feb 11, 2025 · 3 comments
Closed

ROCm Setup failed despite ROCm being available #1508

postcanonical opened this issue Feb 11, 2025 · 3 comments

Comments

@postcanonical
Copy link

System Info

Arch rocm-core 6.2.4-2, bitsandbytes from https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.45.3.dev271-py3-none-manylinux_2_24_x86_64.whl.
Getting this error:

amdgpu.ids: No such file or directory
Could not load bitsandbytes native library: /opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_rocm62.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
Traceback (most recent call last):
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 107, in <module>
    lib = get_native_library()
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 86, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_rocm62.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi

ROCm Setup failed despite ROCm being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues

Traceback (most recent call last):
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 21, in <module>
    from .backends.cpu import CPUBackend
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/backends/cpu.py", line 8, in <module>
    from .cpu_xpu_common import (
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/backends/cpu_xpu_common.py", line 73, in <module>
    def double_quant_impl(A, col_stats=None, row_stats=None, out_col=None, out_row=None, threshold=0.0):
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/bitsandbytes/backends/cpu_xpu_common.py", line 68, in _maybe_torch_compile
    return torch.compile(func, dynamic=True, options=options)
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/__init__.py", line 2565, in compile
    return torch._dynamo.optimize(
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/__init__.py", line 2679, in __getattr__
    return importlib.import_module(f".{name}", __name__)
  File "/opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/_dynamo/__init__.py", line 3, in <module>
    from . import convert_frame, eval_frame, resume_execution
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 33, in <module>
    from torch._dynamo.symbolic_convert import TensorifyState
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 27, in <module>
    from torch._dynamo.exc import TensorifyScalarRestartAnalysis
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 11, in <module>
    from .utils import counters
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1752, in <module>
    if has_triton_package():
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_triton.py", line 9, in has_triton_package
    from triton.compiler.compiler import triton_key
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/__init__.py", line 8, in <module>
    from .runtime import (
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/runtime/__init__.py", line 1, in <module>
    from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 9, in <module>
    from .jit import KernelInterface
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 12, in <module>
    from ..runtime.driver import driver
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/runtime/driver.py", line 1, in <module>
    from ..backends import backends
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/backends/__init__.py", line 50, in <module>
    backends = _discover_backends()
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/backends/__init__.py", line 44, in _discover_backends
    driver = _load_module(name, os.path.join(root, name, 'driver.py'))
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/backends/__init__.py", line 12, in _load_module
    spec.loader.exec_module(module)
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/backends/amd/driver.py", line 7, in <module>
    from triton.runtime.build import _build
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/triton/runtime/build.py", line 8, in <module>
    import setuptools
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/setuptools/__init__.py", line 22, in <module>
    import _distutils_hack.override  # noqa: F401
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/_distutils_hack/override.py", line 1, in <module>
    __import__('_distutils_hack').do_override()
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 89, in do_override
    ensure_local_distutils()
  File "/opt/stabilitymatrix/Data/Packages/ComfyUI/venv/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 76, in ensure_local_distutils
    assert '_distutils' in core.__file__, core.__file__
AssertionError: /opt/stabilitymatrix/Data/Assets/Python310/lib/python3.10/distutils/core.py

Reproduction

install bitsandbytes from https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.45.3.dev271-py3-none-manylinux_2_24_x86_64.whl on ROCm device (7900xtx in my case).

get this message on ComfyUI startup

Expected behavior

no errors

@daviddelaharpegolden
Copy link

well, libbitsandbytes_rocm62.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi

c++filt _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi -> void __device_stub__kOptimizer32bit1State<hip_bfloat16, 2>(hip_bfloat16*, hip_bfloat16*, float*, float*, float, float, float, float, float, float, int, float, float, bool, int)

-> does look like the csrc/ops.hip / crsc/kernels.hip etc. do need some small updates now to match current.

Presumably they were hipify'd from the .cu then perhaps manually tweaked a bit. I guess it's AMD folks who might best do that in

https://github.com/ROCm/bitsandbytes/tree/rocm_enabled_multi_backend?tab=readme-ov-file#bitsandbytes

fork, then PR changes back upstream to here, but anyway.

@matthewdouglas
Copy link
Member

Hi,
This wheel isn't ready yet and does indeed need some further work. For now, please use one of the older wheels:

https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/continuous-release_multi-backend-refactor

@postcanonical
Copy link
Author

that's for alpha testing, closing if not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants