Remove separate NO_CUBLASLT build. #1103

matthewdouglas · 2024-03-04T20:12:44Z

This PR removes the build option NO_CUBLASLT. It additionally removes the runtime check to load the separate nocublaslt variants of the library.

Reasoning:

Having separate library builds adds complexity and extra build time
Since CUDA 11, libcublas actually takes a dependency on libcublasLt already
We have runtime checks against compute capability to avoid calling library functions that would be unsupported

So far I've only tested this on RTX 3060. I do have access to a machine with a GTX 1660, so I'll try to test on that too.

github-actions · 2024-03-04T20:16:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-03-04T20:17:40Z

csrc/ops.cu

+// TODO: Check overhead. Maybe not worth it; just check in Python lib once,
+//       and avoid calling lib functions w/o support for them.
+// TODO: Address GTX 1660, any other 7.5 devices maybe not supported.
+inline bool igemmlt_supported() {
+  int device;
+  int ccMajor;
+
+  CUDA_CHECK_RETURN(cudaGetDevice(&device));
+  CUDA_CHECK_RETURN(cudaDeviceGetAttribute(&ccMajor, cudaDevAttrComputeCapabilityMajor, device));
+
+  if (ccMajor >= 8)
+    return true;
+
+  if (ccMajor < 7)
+    return false;
+
+  int ccMinor;
+  CUDA_CHECK_RETURN(cudaDeviceGetAttribute(&ccMinor, cudaDevAttrComputeCapabilityMinor, device));
+
+  return ccMinor >= 5;
+}


Using this as more of a sanity check right now. I'd expect we wouldn't be calling transform, spmm_coo, or igemmlt with devices that don't support it, but I haven't verified this. In particular the spmm_coo function is one that I am not so sure about.

akx · 2024-03-05T15:33:45Z

If you want, I have a GTX 1070 (under WSL2, works surprisingly well) I can test on:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070        On  | 00000000:2D:00.0  On |                  N/A |
|  0%   62C    P0              37W / 185W |   1204MiB /  8192MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Titus-von-Koeller · 2024-04-09T09:47:07Z

Hey @matthewdouglas @akx,

Tim mentioned that it's probably not safe to remove this.

What's your opinion on this? How can be certain what's what? Currently, I'm not sure how to best proceed.

akx · 2024-04-09T10:01:09Z

Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support?

matthewdouglas · 2024-04-09T13:25:08Z

Did Tim say why it's "probably not safe"? Do we know of an actual situation where cublaslt isn't available? Is such a situation something we want to support?

I'm curious too, but I think there might also be just a naming issue here since the cublasLt has shipped with the CUDA Toolkit since v10.1. It could have been placed in some unusual spots but by the time toolkit 11.0 comes around it's not an issue and we should always be able to link to it. PyTorch binaries ship with it. And if I'm not mistaken, libcublas.so itself depends on libcublaslt.so these days.

The main differentiator here is support for int8 tensor cores (e.g. the check for compute capability >= 7.5). So we would have to make sure to not call F.igemmlt() for such devices. But linking in the libcublaslt code IMO shouldn't be a problem if we're not trying to run the unsupported matmul ops. And there's already a path in MatMul8bitLt for that. We can guard some device code with __CUDA_ARCH__ too.

Some places where cublasLt is used:

int igemmlt<int, int, int>(cublasLtHandle_t ltHandle, int m, int n, int k, ...)
cublasLtOrder_t get_order<int>()
void transform<T, SRC, TARGET, transpose, DTYPE>(cublasLtHandle_t, T *A, T *out, int dim1, int dim2)

Separately, I believe I remember reading somewhere that there would be intent to actually deprecate the int8 matmul path that does not use tensor cores too (F.igemm, MatMul8bit, and also F.vectorwise_quant, F.vectorwise_mm_dequant).

Titus-von-Koeller · 2024-04-09T17:34:40Z

I'll try to get in touch with Tim to get more info from him and relay the new info you provided. Unfortunately, he didn't give any reasoning at the time.

He's quite unavailable atm, so it might take a few days.

Thanks @matthewdouglas for this thorough and knowledgable analysis, this was once again very helpful!

matthewdouglas · 2024-11-29T16:08:52Z

Superseded by #1401.

Always link cublasLt, use runtime checks for support

15d3c8f

matthewdouglas commented Mar 4, 2024

View reviewed changes

matthewdouglas mentioned this pull request Mar 13, 2024

BnB a disk space hog #1129

Closed

matthewdouglas mentioned this pull request Apr 11, 2024

IFU master 2024 03 28 ROCm/bitsandbytes#17

Merged

Titus-von-Koeller force-pushed the main branch 2 times, most recently from 9b72679 to 7800734 Compare July 27, 2024 13:16

matthewdouglas added the Build label Aug 5, 2024

matthewdouglas closed this Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove separate NO_CUBLASLT build. #1103

Remove separate NO_CUBLASLT build. #1103

Uh oh!

matthewdouglas commented Mar 4, 2024

Uh oh!

github-actions bot commented Mar 4, 2024

Uh oh!

matthewdouglas Mar 4, 2024

Uh oh!

akx commented Mar 5, 2024

Uh oh!

Titus-von-Koeller commented Apr 9, 2024

Uh oh!

akx commented Apr 9, 2024

Uh oh!

matthewdouglas commented Apr 9, 2024 •

edited

Loading

Uh oh!

Titus-von-Koeller commented Apr 9, 2024

Uh oh!

matthewdouglas commented Nov 29, 2024

Uh oh!

Uh oh!

Remove separate NO_CUBLASLT build. #1103

Remove separate NO_CUBLASLT build. #1103

Uh oh!

Conversation

matthewdouglas commented Mar 4, 2024

Uh oh!

github-actions bot commented Mar 4, 2024

Uh oh!

matthewdouglas Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

akx commented Mar 5, 2024

Uh oh!

Titus-von-Koeller commented Apr 9, 2024

Uh oh!

akx commented Apr 9, 2024

Uh oh!

matthewdouglas commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titus-von-Koeller commented Apr 9, 2024

Uh oh!

matthewdouglas commented Nov 29, 2024

Uh oh!

Uh oh!

matthewdouglas commented Apr 9, 2024 •

edited

Loading