further docs updates

bitsandbytes-foundation · Feb 2, 2024 · 683a72b · 683a72b
1 parent 301ee80
commit 683a72b
Show file tree

Hide file tree

Showing 12 changed files with 135 additions and 152 deletions.
diff --git a/compile_from_source.md b/compile_from_source.md
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -16,10 +16,18 @@
     title: Optimizers
   - local: integrations
     title: Integrations
+  - local: algorithms
+    title: Algorithms
 - title: Support & Learning
   sections:
   - local: resources
-    title: Papers, related resources & how to cite
+    title: Papers, resources & how to cite
+  - local: errors
+    title: Errors & Solutions
+  - local: nonpytorchcuda
+    title: Non-PyTorch CUDA
+  - local: compiling
+    title: Compilation from Source (extended)
   - local: faqs
     title: FAQs (Frequently Asked Questions)
 - title: Contributors Guidelines

diff --git a/docs/source/algorithms.mdx b/docs/source/algorithms.mdx
@@ -0,0 +1,12 @@
+# Other algorithms
+_WIP: Still incomplete... Community contributions would be greatly welcome!_
+
+This is an overview of the algorithms in `bitsandbytes` that we think would also be useful as standalone entities.
+
+## Using Int8 Matrix Multiplication
+
+For straight Int8 matrix multiplication with mixed precision decomposition you can use ``bnb.matmul(...)``. To enable mixed precision decomposition, use the threshold parameter:
+
+```py
+bnb.matmul(..., threshold=6.0)
+```
diff --git a/docs/source/compiling.mdx b/docs/source/compiling.mdx
@@ -0,0 +1,41 @@
+# Compiling from Source[[compiling]]
+
+To compile from source, the CUDA Toolkit is required. Ensure `nvcc` is installed; if not, follow these steps to install it along with the CUDA Toolkit:
+
+```bash
+wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
+# Use the following syntax: cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
+#   CUDA_VERSION options include 110 to 122
+#   EXPORT_TO_BASH: 0 for False, 1 for True
+
+# Example for installing CUDA 11.7 at ~/local/cuda-11.7 and exporting the path to .bashrc:
+bash install_cuda.sh 117 ~/local 1
+```
+
+For a single compile run with a specific CUDA version, set `CUDA_HOME` to point to your CUDA installation directory. For instance, to compile using CUDA 11.7 located at `~/local/cuda-11.7`, use:
+
+```
+CUDA_HOME=~/local/cuda-11.7 CUDA_VERSION=117 make cuda11x
+```
+
+## General Compilation Steps
+
+1. Use `CUDA_VERSION=XXX make [target]` to compile, where `[target]` includes options like `cuda92`, `cuda10x`, `cuda11x`, and others.
+2. Install with `python setup.py install`.
+
+Ensure `nvcc` is available in your system. If using Anaconda, determine your CUDA version with PyTorch using `conda list | grep cudatoolkit` and match it by downloading the corresponding version from the [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive).
+
+To install CUDA locally without administrative rights:
+
+```bash
+wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
+# Follow the same syntax and example as mentioned earlier
+```
+
+The compilation process relies on the `CUDA_HOME` environment variable to locate CUDA. If `CUDA_HOME` is unset, it will attempt to infer the location from `nvcc`. If `nvcc` is not in your path, you may need to add it or set `CUDA_HOME` manually. For example, if `python -m bitsandbytes` indicates your CUDA path as `/usr/local/cuda-11.7`, you can set `CUDA_HOME` to this path.
+
+If compilation issues arise, please report them.
+
+## Compilation for Kepler Architecture
+
+From version 0.39.1, bitsandbytes no longer includes Kepler binaries in pip installations, requiring manual compilation. Follow the general steps and use `cuda11x_nomatmul_kepler` for Kepler-targeted compilation.
diff --git a/docs/source/contributing.mdx b/docs/source/contributing.mdx
@@ -1,13 +1,17 @@
 # Contributors guidelines
 ... stil under construction ... (feel free to propose materials, `bitsandbytes` is a community project)
 
-# Setup pre-commit hooks
+## Setup pre-commit hooks
 - Install pre-commit hooks with `pip install pre-commit`.
 - Run `pre-commit autoupdate` once to configure the hooks.
 - Re-run `pre-commit autoupdate` every time a new hook got added.
 
 Now all the pre-commit hooks will be automatically run when you try to commit and if they introduce some changes, you need to re-add the changed files before being able to commit and push.
 
+## Doc-string syntax
+
+TODO: Add description + reference of HF docstring best practices.
+
 ## Documentation
 - [guideline for documentation syntax](https://github.com/huggingface/doc-builder#readme)
 - images shall be uploaded via PR in the `bitsandbytes/` directory [here](https://huggingface.co/datasets/huggingface/documentation-images)
diff --git a/errors_and_solutions.md → docs/source/errors.mdx b/errors_and_solutions.md → docs/source/errors.mdx
@@ -1,21 +1,25 @@
-# No kernel image available
+# Errors & Solutions
 
-This problem arises with the cuda version loaded by bitsandbytes is not supported by your GPU, or if you pytorch CUDA version mismatches. To solve this problem you need to debug ``$LD_LIBRARY_PATH``, ``$CUDA_HOME``, ``$PATH``. You can print these via ``echo $PATH``. You should look for multiple paths to different CUDA versions. This can include versions in your anaconda path, for example ``$HOME/anaconda3/lib``. You can check those versions via ``ls -l $HOME/anaconda3/lib/*cuda*`` or equivalent paths. Look at the CUDA versions of files in these paths. Does it match with ``nvidia-smi``?
+## No kernel image available
 
-If you are feeling lucky, you can also try to compile the library from source. This can be still problematic if your PATH variables have multiple cuda versions. As such, it is recommended to figure out path conflicts before you proceed with compilation.
+This problem arises with the cuda version loaded by bitsandbytes is not supported by your GPU, or if you pytorch CUDA version mismatches.
+
+To solve this problem you need to debug ``$LD_LIBRARY_PATH``, ``$CUDA_HOME``, ``$PATH``. You can print these via ``echo $PATH``. You should look for multiple paths to different CUDA versions. This can include versions in your anaconda path, for example ``$HOME/anaconda3/lib``. You can check those versions via ``ls -l $HOME/anaconda3/lib/*cuda*`` or equivalent paths. Look at the CUDA versions of files in these paths. Does it match with ``nvidia-smi``?
 
+If you are feeling lucky, you can also try to compile the library from source. This can be still problematic if your PATH variables have multiple cuda versions. As such, it is recommended to figure out path conflicts before you proceed with compilation.
 
 __If you encounter any other error not listed here please create an issue. This will help resolve your problem and will help out others in the future.
 
 
-# fatbinwrap
+## fatbinwrap
+
+This error occurs if there is a mismatch between CUDA versions in the C++ library and the CUDA part. Make sure you have right CUDA in your `$PATH` and `$LD_LIBRARY_PATH` variable. In the conda base environment you can find the library under:
 
-This error occurs if there is a mismatch between CUDA versions in the C++ library and the CUDA part. Make sure you have right CUDA in your $PATH and $LD_LIBRARY_PATH variable. In the conda base environment you can find the library under:
 ```bash
 ls $CONDA_PREFIX/lib/*cudart*
 ```
 Make sure this path is appended to the `LD_LIBRARY_PATH` so bnb can find the CUDA runtime environment library (cudart).
 
-If this does not fix the issue, please try [compilation from source](compile_from_source.md) next.
+If this does not fix the issue, please try compilation from source next.
 
 If this does not work, please open an issue and paste the printed environment if you call `make` and the associated error when running bnb.
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -5,6 +5,12 @@ Note currently `bitsandbytes` is only supported on CUDA GPU hardwares, support f
 <hfoptions id="OS system">
 <hfoption id="Linux">
 
+## Hardware requirements:
+ - LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or newer).
+ - 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).
+
+Supported CUDA versions: 10.2 - 12.0  #TODO: check currently supported versions
+
 ## Linux
 
 ### From Pypi
@@ -23,6 +29,8 @@ python setup.py install
 
 with `XXX` being your CUDA version, for <12.0 call `make cuda 11x`. Note support for non-CUDA GPUs (e.g. AMD, Intel), is also coming soon.
 
+For a more detailed guide, head to the [dedicated page on the topic](#compiling)
+
 </hfoption>
 <hfoption id="Windows">
 

diff --git a/docs/source/integrations.mdx b/docs/source/integrations.mdx
@@ -9,3 +9,11 @@
 # Trainer for the optimizers
 
 ... TODO: to be filled out ...
+
+Here we point out to relevant doc sections in transformers / peft / Trainer + very briefly explain how these are integrated:
+e.g. for transformers state that you can load any model in 8-bit / 4-bit precision, for PEFT, you can use QLoRA out of the box with `LoraConfig` + 4-bit base model, for Trainer: all bnb optimizers are supported by passing the correct string in `TrainingArguments` : https://github.com/huggingface/transformers/blob/abbffc4525566a48a9733639797c812301218b83/src/transformers/training_args.py#L134
+
+Few references:
+
+- transformers: https://huggingface.co/docs/transformers/quantization#bitsandbytes
+- PEFT: https://huggingface.co/docs/peft/developer_guides/quantization
diff --git a/docs/source/introduction.mdx b/docs/source/introduction.mdx
@@ -7,46 +7,11 @@ The `bitsandbytes` library is a lightweight Python wrapper around CUDA custom fu
 There are ongoing efforts to support further hardware backends, i.e. Intel CPU + GPU, AMD GPU, Apple Silicon. Windows support is on its way as well.
 The library includes quantization primitives for 8-bit & 4-bit operations, through `bitsandbytes.nn.Linear8bitLt` and `bitsandbytes.nn.Linear4bit` and 8bit optimizers through `bitsandbytes.optim` module.
 
-**Using 8-bit optimizers**:
-
-1. Comment out optimizer: ``#torch.optim.Adam(....)``
-2. Add 8-bit optimizer of your choice ``bnb.optim.Adam8bit(....)`` (arguments stay the same)
-3. Replace embedding layer if necessary: ``torch.nn.Embedding(..) -> bnb.nn.Embedding(..)``
-
-
-**Using 8-bit Inference**:
-1. Comment out torch.nn.Linear: ``#linear = torch.nn.Linear(...)``
-2. Add bnb 8-bit linear light module: ``linear = bnb.nn.Linear8bitLt(...)`` (base arguments stay the same)
-3. There are two modes:
-   - Mixed 8-bit training with 16-bit main weights. Pass the argument ``has_fp16_weights=True`` (default)
-   - Int8 inference. Pass the argument ``has_fp16_weights=False``
-4. To use the full LLM.int8() method, use the ``threshold=k`` argument. We recommend ``k=6.0``.
-```python
-# LLM.int8()
-linear = bnb.nn.Linear8bitLt(dim1, dim2, bias=True, has_fp16_weights=False, threshold=6.0)
-# inputs need to be fp16
-out = linear(x.to(torch.float16))
-```
-
-
-## Features
-
-- 8-bit Matrix multiplication with mixed precision decomposition
-- LLM.int8() inference
-- 8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB, Lion (saves 75% memory)
-- Stable Embedding Layer: Improved stability through better initialization, and normalization
-- 8-bit quantization: Quantile, Linear, and Dynamic quantization
-- Fast quantile estimation: Up to 100x faster than other algorithms
-
 ## Requirements & Installation
 
 Requirements: anaconda, cudatoolkit, pytorch
 
-Hardware requirements:
- - LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or newer).
- - 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).
 
-Supported CUDA versions: 10.2 - 12.0
 
 The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.
 
@@ -55,36 +20,10 @@ The requirements can best be fulfilled by installing pytorch via anaconda. You c
 
 ## Using bitsandbytes
 
-### Using Int8 Matrix Multiplication
-
-For straight Int8 matrix multiplication with mixed precision decomposition you can use ``bnb.matmul(...)``. To enable mixed precision decomposition, use the threshold parameter:
-```python
-bnb.matmul(..., threshold=6.0)
-```
+###
 
 For instructions how to use LLM.int8() inference layers in your own code, see the TL;DR above or for extended instruction see [this blog post](https://huggingface.co/blog/hf-bitsandbytes-integration).
 
-### Using the 8-bit Optimizers
-
-With bitsandbytes 8-bit optimizers can be used by changing a single line of code in your codebase. For NLP models we recommend also to use the StableEmbedding layers (see below) which improves results and helps with stable 8-bit optimization.  To get started with 8-bit optimizers, it is sufficient to replace your old optimizer with the 8-bit optimizer in the following way:
-```python
-import bitsandbytes as bnb
-
-# adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizer
-adam = bnb.optim.Adam8bit(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # add bnb optimizer
-adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=8) # equivalent
-
-
-torch.nn.Embedding(...) ->  bnb.nn.StableEmbedding(...) # recommended for NLP models
-```
-
-Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:
-```python
-# parameter tensors with less than 16384 values are optimized in 32-bit
-# it is recommended to use multiplies of 4096
-adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384)
-```
-
 ### Change Bits and other Hyperparameters for Individual Parameters
 
 If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use the `GlobalOptimManager`. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See our [guide](howto_config_override.md) for more details
@@ -97,29 +36,8 @@ To use the Stable Embedding Layer, override the respective `build_embedding(...)
 
 For upcoming features and changes and full history see [Patch Notes](CHANGELOG.md).
 
-## Errors
-
-1. RuntimeError: CUDA error: no kernel image is available for execution on the device. [Solution](errors_and_solutions.md#No-kernel-image-available)
-2. __fatbinwrap_.. [Solution](errors_and_solutions.md#fatbinwrap_)
-
-## Compile from source
-To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
-
-```bash
-wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
-# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
-#   CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122}
-#   EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
-
-# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
-bash install_cuda.sh 117 ~/local 1
-```
-
-To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:
 
-``CUDA_HOME=~/local/cuda-11.7 CUDA_VERSION=117 make cuda11x``
 
-For more detailed instruction, please follow the [compile_from_source.md](compile_from_source.md) instructions.
 
 ## License
 

diff --git a/docs/source/moduletree.mdx b/docs/source/moduletree.mdx
diff --git a/how_to_use_nonpytorch_cuda.md → docs/source/nonpytorchcuda.mdx b/how_to_use_nonpytorch_cuda.md → docs/source/nonpytorchcuda.mdx
@@ -1,6 +1,6 @@
-## How to use a CUDA version that is different from PyTorch
+# How to use a CUDA version that is different from PyTorch
 
-Some features of bitsandbytes may need a newer CUDA version than regularly supported by PyTorch binaries from conda / pip. In that case you can use the following instructions to load a precompiled bitsandbytes binary that works for you.
+Some features of `bitsandbytes` may need a newer CUDA version than regularly supported by PyTorch binaries from conda / pip. In that case you can use the following instructions to load a precompiled `bitsandbytes` binary that works for you.
 
 ## Installing or determining the CUDA installation
 
@@ -12,7 +12,7 @@ Determine the path of the CUDA version that you want to use. Common paths paths
 
 where XX.X is the CUDA version number.
 
-You can also install CUDA version that you need locally with a script provided by bitsandbytes as follows:
+You can also install CUDA version that you need locally with a script provided by `bitsandbytes` as follows:
 
 ```bash
 wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
@@ -25,7 +25,7 @@ wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cud
 bash cuda_install.sh 117 ~/local 1
 ```
 
-## Setting the environmental variables BNB_CUDA_VERSION, and LD_LIBRARY_PATH
+## Setting the environmental variables `BNB_CUDA_VERSION`, and `LD_LIBRARY_PATH`
 
 To manually override the PyTorch installed CUDA version you need to set to variable, like so: