Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFU master 2024 03 28 #17

Merged
merged 125 commits into from
Apr 19, 2024
Merged

IFU master 2024 03 28 #17

merged 125 commits into from
Apr 19, 2024

Conversation

pnunna93
Copy link
Collaborator

@pnunna93 pnunna93 commented Apr 4, 2024

This PR pulls latest upstream changes and is updated to enable cmake build on ROCm. This is the conflicts_diff.txt file.

Its tested on rocm 6.0 and 6.2. UT logs are as below:
bnb_0.44.0.dev0_rocm6.0.log
bnb_0.44.0.dev0_rocm6.2.log

MilesCranmer and others added 30 commits January 25, 2024 12:33
* Fix `max_memory` example on README

- The new `max_memory` syntax expects a dictionary
- This change also accounts for multiple devices

* Fix model name in `from_pretrained` on README
* fix library loading

Signed-off-by: Won-Kyu Park <[email protected]>

* fixed library loading

 * use os.pathsep

* use glob(), search CUDA_PATH

* call find_file_recursive() without ext

---------

Signed-off-by: Won-Kyu Park <[email protected]>
Co-authored-by: James Wyatt <[email protected]>
* Fix erroneous type aliasing

* Fix `Optional` typings (see PEP 484)

* Add Mypy ignores

* Fix Mypy complaints for method tables

* Fix type for get_ptr

* Fix various Mypy errors

* Fix missed call to is_triton_available
* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------

Co-authored-by: Titus <[email protected]>
based on @Jamezo97 and @acpopescu work

manually cherry-picked from PR bitsandbytes-foundation#788 and PR bitsandbytes-foundation#229 and cleanup by wkpark

Signed-off-by: Won-Kyu Park <[email protected]>
…n#1000)

`out_order` is the global parametrization list, not the test fixture argument
* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)
…esbelkada-patch-2

Create upload_pr_documentation.yml
…n-fix

minimal patch to fix Windows compilation issues
 * fix project name and add lib prefix for win32 (2024/01/31)
 * set LIBRARY_OUTPUT_DIRECTORY property

Co-authored-by: Won-Kyu Park <[email protected]>
 * build matrix for ubuntu + python 3.10, 3.11 + cuda 11.8 + 12.1 (windows is disabled for now)
 * add environment-bnb.yml for building
 * more fixes suggested by @akx (2024/01/30)
 * use python -m build --wheel suggested by @akx

Co-authored-by: Aarni Koskela <[email protected]>
 * add a comment suggested by @akx (2024/01/30)

Co-authored-by: Aarni Koskela <[email protected]>
matthewdouglas and others added 3 commits March 26, 2024 22:42
* Add CUDA 12.4 download to utility script, docs

* (ci) Add CUDA 12.4.0 build to workflow

* Apply ruff format to install_cuda.py
Updates the requirements on [einops](https://github.com/arogozhnikov/einops), [wheel](https://github.com/pypa/wheel), [lion-pytorch](https://github.com/lucidrains/lion-pytorch) and [scipy](https://github.com/scipy/scipy) to permit the latest version.

Updates `einops` from 0.6.0 to 0.7.0
- [Release notes](https://github.com/arogozhnikov/einops/releases)
- [Commits](arogozhnikov/einops@v0.6.0...v0.7.0)

Updates `wheel` to 0.43.0
- [Release notes](https://github.com/pypa/wheel/releases)
- [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst)
- [Commits](pypa/wheel@0.40.0...0.43.0)

Updates `lion-pytorch` from 0.0.6 to 0.1.2
- [Release notes](https://github.com/lucidrains/lion-pytorch/releases)
- [Commits](lucidrains/lion-pytorch@0.0.6...0.1.2)

Updates `scipy` from 1.11.4 to 1.12.0
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](scipy/scipy@v1.11.4...v1.12.0)

---
updated-dependencies:
- dependency-name: einops
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-patch
- dependency-name: wheel
  dependency-type: direct:development
  dependency-group: minor-patch
- dependency-name: lion-pytorch
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-patch
- dependency-name: scipy
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@lcskrishna
Copy link

@pnunna93 Looks like some linter issues and some READMe conflicts are present. Could you check those?

Copy link

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated
Comment on lines 266 to 271
if(NO_CUBLASLT)
target_compile_definitions(bitsandbytes PUBLIC NO_HIPBLASLT)
else()
find_package(hipblaslt)
target_link_libraries(bitsandbytes PUBLIC roc::hipblaslt)
endif()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be preferable if we just always link hipblaslt if possible.
I see we already have supports_igemmlt() always return True if torch.version.hip.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change that. Added that part to support NO_CUBLASLT option, but haven't implemented it yet. I will take it out to avoid confusion.

Copy link

@arlo-phoenix arlo-phoenix Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RDNA 2 / (3) and CDNA 1 don't support hipblasLt though, only very recent devices. Cuda has the option as well even though support is far broader there so it should absolutely stay. Python part will need adjustment on which shared library to load, we could probably take a look at how Cuda does it, but simplest solution will be to just check

gfx_arch = torch.cuda.get_device_capability()
if gfx_arch[0] == 9 and gfx_arch[1] in ["0a", "4x"] #no idea if second part of tuple will be exactly need can't test, for me gfx 1030 is (10, 3)

and only enable hibblaslt / igemm for that.

Edit: Just to make it clear. I understand you won't be able to test/validate all targets, but the changes to at least give the possibility to run bitsandbytes on any ROCm supported GPU are minimal and are worth it imo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related? ROCm/hipBLASLt#516

Does this absolutely need to be implemented with hipBLASLt or could another library like rocBLAS be used?

If not, can the build/distribution still be simplified, and link to this library, and then decide whether to call it at runtime based on device support? See bitsandbytes-foundation#1103

For the CUDA version, it's just looking at torch.cuda.get_device_capability() and only supporting igemmlt when >= (7,5), so very similar idea.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related? ROCm/hipBLASLt#516

I can use latest torch just fine on gfx1030, so I guess even if we linked it, it would work. But no idea about how it works internally.

Does this absolutely need to be implemented with hipBLASLt or could another library like rocBLAS be used?

wdym? blasLt is for matrix cores, yes functions could be emulated with rocBLAS, but bitsandbytes works just fine without it. I can use optimizers and 4bit stuff and I'm pretty sure there would be no speed benefit emulating blasLt (if there was we could just improve current fallback code).

If not, can the build/distribution still be simplified, and link to this library, and then decide whether to call it at runtime based on device support? See TimDettmers#1103

I'm biased here. hipblasLt is a very recent library, it's the reason I named my fork 5.6 since it required headers (this repo moved defines though that you wouldn't even need headers which would be nice for pre 6.0 support). Also e.g. Arch still ships hipblaslt separately, don't know about other distros.

If the unified build gets merged for Cuda we can still change it for ROCm later. Imo ROCm should always mirror CUDA unless it's not possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I think mainly the part that got me was supports_igemmlt() always returning True. I mentioned rocBLAS because my understanding was that it would use the matrix cores / MFMA instructions where supported - or at the very least, on gfx908.

In general my understanding on the CUDA side is that the int8 matmul without tensor cores is going to be deprecated in the future too, but we'll want to fail in a friendly way, so some kind of test at runtime before calling igemmlt would be ideal. IMO not super critical to have a fallback.

Copy link

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnunna93 Few minor changes needed, but remaining all looks good. Could you please have a look at these.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
Copy link

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging it.

@lcskrishna lcskrishna merged commit c037a30 into rocm_enabled Apr 19, 2024
14 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.