some changes

bitsandbytes-foundation · Feb 2, 2024 · 58566e2 · 58566e2
1 parent 725d29a
commit 58566e2
Show file tree

Hide file tree

Showing 11 changed files with 38 additions and 36 deletions.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -16,8 +16,6 @@
     title: Optimizers
   - local: integrations
     title: Integrations
-  - local: qlora
-    title: QLoRA
 - title: Support & Learning
   sections:
   - local: resources

diff --git a/docs/source/faqs.mdx b/docs/source/faqs.mdx
@@ -4,4 +4,4 @@ Please submit your questions in [this Github Discussion thread](https://github.c
 
 We'll pick the most generally applicable ones and post the QAs here or integrate them into the general documentation (also feel free to submit doc PRs, please).
 
-# ... under construction ...
+# ... under construction ...
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -4,7 +4,6 @@ Note currently `bitsandbytes` is only supported on CUDA GPU hardwares, support f
 
 <hfoptions id="OS system">
 <hfoption id="Linux">
-<hfoption id="MacOS">
 
 ## Linux
 
@@ -22,7 +21,7 @@ CUDA_VERSION=XXX make cuda12x
 python setup.py install
 ```
 
-with `XXX` being your CUDA version, for <12.0 call `make cuda 11x`
+with `XXX` being your CUDA version, for <12.0 call `make cuda 11x`. Note support for non-CUDA GPUs (e.g. AMD, Intel), is also coming soon.
 
 </hfoption>
 <hfoption id="Windows">
@@ -41,11 +40,12 @@ python -m build --wheel
 Big thanks to [wkpark](https://github.com/wkpark), [Jamezo97](https://github.com/Jamezo97), [rickardp](https://github.com/rickardp), [akx](https://github.com/akx) for their amazing contributions to make bitsandbytes compatible with Windows.
 
 </hfoption>
-<hfoption id="Windows">
+<hfoption id="MacOS">
 
 ## MacOS
 
-Mac support is still a work in progress.
+Mac support is still a work in progress. Please make sure to check out the latest bitsandbytes issues to get notified about the progress with respect to MacOS integration.
 
 </hfoption>
+
 </hfoptions>
diff --git a/docs/source/integrations.mdx b/docs/source/integrations.mdx
@@ -1,8 +1,11 @@
 # Transformers
+
 ... TODO: to be filled out ...
 
 # PEFT
+
 ... TODO: to be filled out ...
 
 # Trainer for the optimizers
+
 ... TODO: to be filled out ...
diff --git a/docs/source/introduction.mdx b/docs/source/introduction.mdx
@@ -5,20 +5,10 @@ TODO: Many parts of this doc will still be redistributed among the new doc struc
 The `bitsandbytes` library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
 
 There are ongoing efforts to support further hardware backends, i.e. Intel CPU + GPU, AMD GPU, Apple Silicon. Windows support is on its way as well.
+The library includes quantization primitives for 8-bit & 4-bit operations, through `bitsandbytes.nn.Linear8bitLt` and `bitsandbytes.nn.Linear4bit` and 8bit optimizers through `bitsandbytes.optim` module.
 
+**Using 8-bit optimizers**:
 
-```python
-from transformers import AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained(
-  'decapoda-research/llama-7b-hf',
-  device_map='auto',
-  load_in_8bit=True,
-  max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
-```
-
-A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py).
-
-**Using 8-bit optimizer**:
 1. Comment out optimizer: ``#torch.optim.Adam(....)``
 2. Add 8-bit optimizer of your choice ``bnb.optim.Adam8bit(....)`` (arguments stay the same)
 3. Replace embedding layer if necessary: ``torch.nn.Embedding(..) -> bnb.nn.Embedding(..)``
@@ -40,6 +30,7 @@ out = linear(x.to(torch.float16))
 
 
 ## Features
+
 - 8-bit Matrix multiplication with mixed precision decomposition
 - LLM.int8() inference
 - 8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB, Lion (saves 75% memory)

diff --git a/docs/source/moduletree.mdx b/docs/source/moduletree.mdx
@@ -1,5 +1,5 @@
 # Module tree overview
 
-- **bitsandbytes.functional**: Contains quantization functions and stateless 8-bit optimizer update functions.
+- **bitsandbytes.functional**: Contains quantization functions (4-bit & 8-bit) and stateless 8-bit optimizer update functions.
 - **bitsandbytes.nn.modules**: Contains stable embedding layer with automatic 32-bit optimizer overrides (important for NLP stability)
-- **bitsandbytes.optim**: Contains 8-bit optimizers.
+- **bitsandbytes.optim**: Contains 8-bit optimizers.
diff --git a/docs/source/optimizers.mdx b/docs/source/optimizers.mdx
@@ -1,4 +1,5 @@
 # Introduction: 8-bit optimizers
+
 With 8-bit optimizers, larger models can be finetuned with the same GPU memory compared to standard 32-bit optimizer training. 8-bit optimizers are a drop-in replacement for regular optimizers:
 
 - Faster (e.g. 4x faster than regular Adam)
@@ -12,7 +13,7 @@ See here the biggest models
 We feature 8-bit Adam/AdamW, SGD momentum, LARS, LAMB, and RMSProp.
 
 It only requires a two-line code change to get started.
-```
+```py
 import bitsandbytes as bnb
 
 # before: adam = torch.optim.Adam(...)
@@ -25,20 +26,30 @@ bnb.nn.StableEmbedding(...)
 
 The arguments passed are the same as standard Adam. For NLP models we recommend also to use the StableEmbedding layers which improves results and helps with stable 8-bit optimization.
 
+## Overview of supported 8-bit optimizers 
+
+TOOD: List here all optimizers in `bitsandbytes/optim/__init__.py`
+TODO (future) have an automated API docs through doc-builder
+
 ## Overview of expected gradients
 
-TODO: add pics here, no idea how to do that
+<div style="text-align: center">
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bitsandbytes/optimizer_comparison.png", width="50%">
+</div>
 
-Want to add both pics in https://huggingface.co/datasets/huggingface/documentation-images/tree/main/bitsandbytes
+<div style="text-align: center">
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bitsandbytes/optimizer_largest_model.png", width="50%">
+</div>
 
 # Research Background
 
 Stateful optimizers maintain gradient statistics over time, e.g. the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can be used to accelerate optimization compared to plain stochastic gradient descent but uses memory that might otherwise be allocated to model parameters, thereby limiting the maximum size of models trained in practice. `bitsandbytes` optimizers use 8-bit statistics, while maintaining the performance levels of using 32-bit optimizer states.
 
 To overcome the resulting computational, quantization and stability challenges, 8-bit optimizers have three components:
-1) **Block-wise quantization** divides input tensors into smaller blocks that are independently quantized, therein isolating outliers and distributing the error more equally over all bits. Each block is processed in parallel across cores, yielding faster optimization and high precision quantization.
-2) **dynamic quantization**, which quantizes both small and large values with high precision,
-3) a **stable embedding layer** improves stability during optimization for models with word embeddings.
+
+1- **Block-wise quantization** divides input tensors into smaller blocks that are independently quantized, therein isolating outliers and distributing the error more equally over all bits. Each block is processed in parallel across cores, yielding faster optimization and high precision quantization.
+2- **dynamic quantization**, which quantizes both small and large values with high precision,
+3- a **stable embedding layer** improves stability during optimization for models with word embeddings.
 
 With these components, performing an optimizer update with 8-bit states is straightforward. We dequantize the 8-bit optimizer states to 32-bit, perform the update and then quantize the states back to 8-bit for storage.
 
@@ -65,15 +76,14 @@ The Stable Embedding Layer enhances the standard word embedding layer for improv
 
 Some more examples of how you can replace your old optimizer with the 8-bit optimizer:
 
-```
+```diff
 import bitsandbytes as bnb
 
-# adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizer
-adam = bnb.optim.Adam8bit(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # add bnb optimizer
-adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=8) # equivalent
+- adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizer
++ adam = bnb.optim.Adam8bit(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # add bnb optimizer
 
 # use 32-bit Adam with 5th percentile clipping
-adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995),
++ adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995),
                       optim_bits=32, percentile_clipping=5)
 ```
 

diff --git a/docs/source/qlora.mdx b/docs/source/qlora.mdx
diff --git a/docs/source/quantization.mdx b/docs/source/quantization.mdx
@@ -1,5 +1,5 @@
-# Linear8bitLt
+# Linear8bitLt (LLM.int8)
 ... TODO: to be filled out ...
 
-# Linear4bit
+# Linear4bit (QLoRA)
 ... TODO: to be filled out ...
diff --git a/docs/source/quickstart.mdx b/docs/source/quickstart.mdx
@@ -8,5 +8,5 @@
 
 The following code illustrates the steps above.
 
-```python
+```py
 ```
diff --git a/docs/source/resources.mdx b/docs/source/resources.mdx
@@ -3,6 +3,7 @@
 The below academic work is ordered in reverse chronological order.
 
 ## [SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression (Jun 2023)](https://arxiv.org/abs/2306.03078)
+
 Authors: Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh
 
 - [Twitter summary thread](https://twitter.com/Tim_Dettmers/status/1666076553665744896)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -4,4 +4,4 @@ Please submit your questions in [this Github Discussion thread](https://github.c

		We'll pick the most generally applicable ones and post the QAs here or integrate them into the general documentation (also feel free to submit doc PRs, please).

		# ... under construction ...
		# ... under construction ...
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,5 +8,5 @@ @@
     The following code illustrates the steps above.
-    ```python
+    ```py
     ```