IFU master 2024 03 28 #17

pnunna93 · 2024-04-04T19:04:22Z

This PR pulls latest upstream changes and is updated to enable cmake build on ROCm. This is the conflicts_diff.txt file.

Its tested on rocm 6.0 and 6.2. UT logs are as below:
bnb_0.44.0.dev0_rocm6.0.log
bnb_0.44.0.dev0_rocm6.2.log

* Fix `max_memory` example on README - The new `max_memory` syntax expects a dictionary - This change also accounts for multiple devices * Fix model name in `from_pretrained` on README

* fix library loading Signed-off-by: Won-Kyu Park <[email protected]> * fixed library loading * use os.pathsep * use glob(), search CUDA_PATH * call find_file_recursive() without ext --------- Signed-off-by: Won-Kyu Park <[email protected]> Co-authored-by: James Wyatt <[email protected]>

* Fix erroneous type aliasing * Fix `Optional` typings (see PEP 484) * Add Mypy ignores * Fix Mypy complaints for method tables * Fix type for get_ptr * Fix various Mypy errors * Fix missed call to is_triton_available

* Adjust Ruff configuration * do not autofix always * be less strict around tests and benchmarks * adjust ignores for now * Ruff: autofix I and F401 * Apply ruff autofixes * Fix RUF013 complaint * Fix mutable default in replace_linear * Don't use bare except * Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo * Fix ruff B008 (function call in arguments) * Add ruff noqas as suitable * Fix RUF005 (splat instead of concatenating) * Fix B018 (useless expression) * Add pre-commit configuration + GitHub Actions lint workflow * Fix unused `e` in bitsandbytes/__main__.py * fix merge conflict resolution error * run pre-commit hook --------- Co-authored-by: Titus <[email protected]>

…tion#998)

@Jamezo97

based on @Jamezo97 and @acpopescu work manually cherry-picked from PR bitsandbytes-foundation#788 and PR bitsandbytes-foundation#229 and cleanup by wkpark Signed-off-by: Won-Kyu Park <[email protected]>

…n#1000) `out_order` is the global parametrization list, not the test fixture argument

* test_nvidia_transform: fix variable reference `out_order` is the global parametrization list, not the test fixture argument * Make `parametrize` use more idiomatic * Use a more deterministic helper for `dim*` determination * Convert NO_CUBLASLT errors into skips too * Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

…esbelkada-patch-2 Create upload_pr_documentation.yml

…n-fix minimal patch to fix Windows compilation issues

…bytes-foundation#876

* fix project name and add lib prefix for win32 (2024/01/31) * set LIBRARY_OUTPUT_DIRECTORY property Co-authored-by: Won-Kyu Park <[email protected]>

@akx

* build matrix for ubuntu + python 3.10, 3.11 + cuda 11.8 + 12.1 (windows is disabled for now) * add environment-bnb.yml for building * more fixes suggested by @akx (2024/01/30) * use python -m build --wheel suggested by @akx Co-authored-by: Aarni Koskela <[email protected]>

@akx

* add a comment suggested by @akx (2024/01/30) Co-authored-by: Aarni Koskela <[email protected]>

Cmake + workflows

)

* Add CUDA 12.4 download to utility script, docs * (ci) Add CUDA 12.4.0 build to workflow * Apply ruff format to install_cuda.py

Updates the requirements on [einops](https://github.com/arogozhnikov/einops), [wheel](https://github.com/pypa/wheel), [lion-pytorch](https://github.com/lucidrains/lion-pytorch) and [scipy](https://github.com/scipy/scipy) to permit the latest version. Updates `einops` from 0.6.0 to 0.7.0 - [Release notes](https://github.com/arogozhnikov/einops/releases) - [Commits](arogozhnikov/einops@v0.6.0...v0.7.0) Updates `wheel` to 0.43.0 - [Release notes](https://github.com/pypa/wheel/releases) - [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst) - [Commits](pypa/wheel@0.40.0...0.43.0) Updates `lion-pytorch` from 0.0.6 to 0.1.2 - [Release notes](https://github.com/lucidrains/lion-pytorch/releases) - [Commits](lucidrains/lion-pytorch@0.0.6...0.1.2) Updates `scipy` from 1.11.4 to 1.12.0 - [Release notes](https://github.com/scipy/scipy/releases) - [Commits](scipy/scipy@v1.11.4...v1.12.0) --- updated-dependencies: - dependency-name: einops dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-patch - dependency-name: wheel dependency-type: direct:development dependency-group: minor-patch - dependency-name: lion-pytorch dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-patch - dependency-name: scipy dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

lcskrishna · 2024-04-09T02:33:30Z

@pnunna93 Looks like some linter issues and some READMe conflicts are present. Could you check those?

…ndbytes into IFU-master-2024-03-28

lcskrishna

LGTM.

CMakeLists.txt

matthewdouglas · 2024-04-10T19:45:11Z

CMakeLists.txt

+    if(NO_CUBLASLT)
+	target_compile_definitions(bitsandbytes PUBLIC NO_HIPBLASLT)
+    else()
+	find_package(hipblaslt)
+        target_link_libraries(bitsandbytes PUBLIC roc::hipblaslt)
+    endif()


Would be preferable if we just always link hipblaslt if possible.
I see we already have supports_igemmlt() always return True if torch.version.hip.

Sure, I can change that. Added that part to support NO_CUBLASLT option, but haven't implemented it yet. I will take it out to avoid confusion.

RDNA 2 / (3) and CDNA 1 don't support hipblasLt though, only very recent devices. Cuda has the option as well even though support is far broader there so it should absolutely stay. Python part will need adjustment on which shared library to load, we could probably take a look at how Cuda does it, but simplest solution will be to just check

gfx_arch = torch.cuda.get_device_capability() if gfx_arch[0] == 9 and gfx_arch[1] in ["0a", "4x"] #no idea if second part of tuple will be exactly need can't test, for me gfx 1030 is (10, 3)

and only enable hibblaslt / igemm for that.

Edit: Just to make it clear. I understand you won't be able to test/validate all targets, but the changes to at least give the possibility to run bitsandbytes on any ROCm supported GPU are minimal and are worth it imo.

Is this related? ROCm/hipBLASLt#516

Does this absolutely need to be implemented with hipBLASLt or could another library like rocBLAS be used?

If not, can the build/distribution still be simplified, and link to this library, and then decide whether to call it at runtime based on device support? See bitsandbytes-foundation#1103

For the CUDA version, it's just looking at torch.cuda.get_device_capability() and only supporting igemmlt when >= (7,5), so very similar idea.

Is this related? ROCm/hipBLASLt#516

I can use latest torch just fine on gfx1030, so I guess even if we linked it, it would work. But no idea about how it works internally.

Does this absolutely need to be implemented with hipBLASLt or could another library like rocBLAS be used?

wdym? blasLt is for matrix cores, yes functions could be emulated with rocBLAS, but bitsandbytes works just fine without it. I can use optimizers and 4bit stuff and I'm pretty sure there would be no speed benefit emulating blasLt (if there was we could just improve current fallback code).

If not, can the build/distribution still be simplified, and link to this library, and then decide whether to call it at runtime based on device support? See TimDettmers#1103

I'm biased here. hipblasLt is a very recent library, it's the reason I named my fork 5.6 since it required headers (this repo moved defines though that you wouldn't even need headers which would be nice for pre 6.0 support). Also e.g. Arch still ships hipblaslt separately, don't know about other distros.

If the unified build gets merged for Cuda we can still change it for ROCm later. Imo ROCm should always mirror CUDA unless it's not possible.

That's fine. I think mainly the part that got me was supports_igemmlt() always returning True. I mentioned rocBLAS because my understanding was that it would use the matrix cores / MFMA instructions where supported - or at the very least, on gfx908.

In general my understanding on the CUDA side is that the int8 matmul without tensor cores is going to be deprecated in the future too, but we'll want to fail in a friendly way, so some kind of test at runtime before calling igemmlt would be ideal. IMO not super critical to have a fallback.

lcskrishna

@pnunna93 Few minor changes needed, but remaining all looks good. Could you please have a look at these.

CMakeLists.txt

lcskrishna

LGTM. Merging it.

MilesCranmer and others added 30 commits January 25, 2024 12:33

Fix max_memory example on README (bitsandbytes-foundation#944)

94c7f2c

* Fix `max_memory` example on README - The new `max_memory` syntax expects a dictionary - This change also accounts for multiple devices * Fix model name in `from_pretrained` on README

Correct type hint in functional.py (bitsandbytes-foundation#992)

8ddfda1

Correct type annotation for quantize_4bit (bitsandbytes-foundation#994)

277ac27

Add .git-blame-ignore-revs file (bitsandbytes-foundation#987)

619e9b3

Don't require scipy for regular use (bitsandbytes-foundation#948)

32be289

Fix some issues found by Mypy (bitsandbytes-foundation#995)

a8c9dfa

* Fix erroneous type aliasing * Fix `Optional` typings (see PEP 484) * Add Mypy ignores * Fix Mypy complaints for method tables * Fix type for get_ptr * Fix various Mypy errors * Fix missed call to is_triton_available

Don't crash Python interpreter via assert(false) (bitsandbytes-founda…

29a637b

…tion#998)

Update build_documentation.yml (bitsandbytes-foundation#999)

b90db7e

minimal fix to support Windows

fd319d5

based on @Jamezo97 and @acpopescu work manually cherry-picked from PR bitsandbytes-foundation#788 and PR bitsandbytes-foundation#229 and cleanup by wkpark Signed-off-by: Won-Kyu Park <[email protected]>

test_nvidia_transform: fix variable reference (bitsandbytes-foundatio…

1a0dc5c

…n#1000) `out_order` is the global parametrization list, not the test fixture argument

Create upload_pr_documentation.yml

25fe140

Update .github/workflows/upload_pr_documentation.yml

6aa85a5

Merge pull request bitsandbytes-foundation#1003 from TimDettmers/youn…

4261d89

…esbelkada-patch-2 Create upload_pr_documentation.yml

Merge pull request bitsandbytes-foundation#876 from wkpark/minimal-wi…

89876bb

…n-fix minimal patch to fix Windows compilation issues

add missing definitions accidentally removed by mistake in PR bitsand…

d97700a

…bytes-foundation#876

Add CMakeLists.txt

5f76fe9

* fix project name and add lib prefix for win32 (2024/01/31) * set LIBRARY_OUTPUT_DIRECTORY property Co-authored-by: Won-Kyu Park <[email protected]>

add ext_modules

c12c4c1

* add a comment suggested by @akx (2024/01/30) Co-authored-by: Aarni Koskela <[email protected]>

add windows build matrix

9269139

Update installation.mdx

5831205

Update docs/source/installation.mdx

0f3d029

Merge pull request bitsandbytes-foundation#908 from wkpark/cmake

3a630c5

Cmake + workflows

Enable line-ending and other hygiene lints (bitsandbytes-foundation#1006

6974920

)

CI: don't build docs in private PRs

c4079bd

CI: set concurrency limits to avoid extra builds

f718770

CI: don't include compiler type in matrix

c62d831

CI: remove ctest phase (there are no CMake tests)

1cbe9fd

matthewdouglas and others added 3 commits March 26, 2024 22:42

Add CUDA 12.4 to docs/install helper (bitsandbytes-foundation#1136)

0405263

* Add CUDA 12.4 download to utility script, docs * (ci) Add CUDA 12.4.0 build to workflow * Apply ruff format to install_cuda.py

Merge remote-tracking branch 'upstream/main' into IFU-master-2024-03-28

a551c16

pnunna93 requested review from lcskrishna, Lzy17 and amathews-amd April 4, 2024 19:04

pnunna93 and others added 5 commits April 9, 2024 20:37

Update README.md

ff33371

Merge branch 'rocm_enabled' into IFU-master-2024-03-28

1157e73

fix PEP errors

702ca1a

Fix typos

8c23dc0

Merge branch 'IFU-master-2024-03-28' of https://github.com/ROCm/bitsa…

971f4b1

…ndbytes into IFU-master-2024-03-28

lcskrishna approved these changes Apr 10, 2024

View reviewed changes

Fix formatting in README file

4d6408a

arlo-phoenix reviewed Apr 10, 2024

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

matthewdouglas reviewed Apr 10, 2024

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

matthewdouglas reviewed Apr 10, 2024

View reviewed changes

pnunna93 added 8 commits April 18, 2024 23:11

Update gpu arch setting

79cb554

Add ROCM_PATH variable

5c0414e

Add HIP_VERSION variable

47795f5

Add BNB_HIP_VERSION variable

6d90452

Update supports igemmlt based on HIP version

049a2dc

Skip failing tests based on HIP version

47a0bc3

pre-commit fixes

1b2a095

Update README file

4515a21

lcskrishna requested changes Apr 19, 2024

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

pnunna93 added 2 commits April 19, 2024 14:27

Update default arch list

e7ef75f

update readme

c0d244c

lcskrishna approved these changes Apr 19, 2024

View reviewed changes

lcskrishna merged commit c037a30 into rocm_enabled Apr 19, 2024
14 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IFU master 2024 03 28 #17

IFU master 2024 03 28 #17

pnunna93 commented Apr 4, 2024

lcskrishna commented Apr 9, 2024

lcskrishna left a comment

matthewdouglas Apr 10, 2024

pnunna93 Apr 10, 2024

arlo-phoenix Apr 10, 2024 •

edited

Loading

matthewdouglas Apr 11, 2024

arlo-phoenix Apr 11, 2024

matthewdouglas Apr 11, 2024

lcskrishna left a comment

lcskrishna left a comment

IFU master 2024 03 28 #17

IFU master 2024 03 28 #17

Conversation

pnunna93 commented Apr 4, 2024

lcskrishna commented Apr 9, 2024

lcskrishna left a comment

Choose a reason for hiding this comment

matthewdouglas Apr 10, 2024

Choose a reason for hiding this comment

pnunna93 Apr 10, 2024

Choose a reason for hiding this comment

arlo-phoenix Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

matthewdouglas Apr 11, 2024

Choose a reason for hiding this comment

arlo-phoenix Apr 11, 2024

Choose a reason for hiding this comment

matthewdouglas Apr 11, 2024

Choose a reason for hiding this comment

lcskrishna left a comment

Choose a reason for hiding this comment

lcskrishna left a comment

Choose a reason for hiding this comment

arlo-phoenix Apr 10, 2024 •

edited

Loading