Add additional sections, first optimizers, MacOS WIP

bitsandbytes-foundation · Feb 1, 2024 · 84b5fc0 · 84b5fc0
1 parent fbc0385
commit 84b5fc0
Show file tree

Hide file tree

Showing 6 changed files with 133 additions and 8 deletions.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -1,8 +1,16 @@
-- sections:
-  - local: index
-    title: Bits & Bytes
+- title: Get started
+  sections:
+  - local: introduction
+    title: Introduction
   - local: quickstart
     title: Quickstart
   - local: installation
     title: Installation
-  title: Get started
+- title: Features & Integrations
+  sections:
+  - local: quantization
+    title: Quantization
+  - local: optimizers
+    title: Optimizers
+  - local: integrations
+    title: Integrations
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -4,6 +4,7 @@ Note currently `bitsandbytes` is only supported on CUDA GPU hardwares, support f
 
 <hfoptions id="OS system">
 <hfoption id="Linux">
+<hfoption id="MacOS">
 
 ## Linux
 
@@ -39,5 +40,12 @@ python -m build --wheel
 
 Big thanks to [wkpark](https://github.com/wkpark), [Jamezo97](https://github.com/Jamezo97), [rickardp](https://github.com/rickardp), [akx](https://github.com/akx) for their amazing contributions to make bitsandbytes compatible with Windows.
 
+</hfoption>
+<hfoption id="Windows">
+
+## MacOS
+
+Mac support is still a work in progress.
+
 </hfoption>
 </hfoptions>
diff --git a/docs/source/integrations.mdx b/docs/source/integrations.mdx
@@ -0,0 +1,5 @@
+# Transformers
+
+# PEFT
+
+# Trainer for the optimizers
diff --git a/docs/source/index.mdx → docs/source/introduction.mdx b/docs/source/index.mdx → docs/source/introduction.mdx
@@ -1,10 +1,10 @@
-# bitsandbytes
+# `bitsandbytes`
 
-The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
+The `bitsandbytes` library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
 
+There are ongoing efforts to support further hardware backends, i.e. Intel CPU + GPU, AMD GPU, Apple Silicon. Windows support is on its way as well.
 
-
-Resources:
+# Resources:
 - [8-bit Optimizer Paper](https://arxiv.org/abs/2110.02861) --  [Video](https://www.youtube.com/watch?v=IxrlHAJtqKE) -- [Docs](https://bitsandbytes.readthedocs.io/en/latest/)
 
 - [LLM.int8() Paper](https://arxiv.org/abs/2208.07339) -- [LLM.int8() Software Blog Post](https://huggingface.co/blog/hf-bitsandbytes-integration) -- [LLM.int8() Emergent Features Blog Post](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/)

diff --git a/docs/source/optimizers.mdx b/docs/source/optimizers.mdx
@@ -0,0 +1,103 @@
+Here we provide a short description and usage examples for each optimizer in `bitsandbytes.optim. We'll start by explaining the core optimizer class `Optimizer8bit`, followed by the specific implementations `Adagrad`, `Adagrad8bit` and `Adagrad32bit`.
+
+Each of these optimizers can be utilized depending on the specific requirements of the task at hand, such as memory constraints, computational efficiency and the need for precision.
+
+# Optimizer base class
+
+## `Optimizer8bit`
+
+The `Optimizer8bit` class serves as a base class for all 8-bit optimizers, providing common functionalities required for quantized optimization. The class is designed to support both 32-bit and 8-bit computations, where 8-bit optimizations can significantly reduce memory footprint and increase computation speed.
+
+### Usage:
+
+```python
+import torch
+from bitsandbytes.optim import Optimizer8bit
+
+model = YourModel()
+params = model.parameters()
+
+# Initialize the optimizer with your model's parameters
+optimizer = Optimizer8bit(params, defaults={
+    'lr': 0.001,
+    'betas': (0.9, 0.999),
+    'eps': 1e-08,
+    'weight_decay': 0
+}, optim_bits=8)  # Use optim_bits=32 for 32-bit optimization
+
+# In your training loop
+optimizer.zero_grad()
+loss = compute_loss()  # Implement your loss computation
+loss.backward()
+optimizer.step()
+```
+
+# Adagrad implementations
+
+## `Adagrad`
+
+The `Adagrad` class is an implementation of the Adagrad optimizer, which adapts the learning rate for each parameter based on the historical gradient information. This version allows for both 32-bit and 8-bit representations, with specific classes for each.
+
+### `Adagrad` Usage:
+
+```python
+import torch
+from bitsandbytes.optim import Adagrad
+
+model = YourModel()
+params = model.parameters()
+
+# Initialize the optimizer with your model's parameters
+optimizer = Adagrad(params, lr=0.01)
+
+# In your training loop
+optimizer.zero_grad()
+loss = compute_loss()  # Implement your loss computation
+loss.backward()
+optimizer.step()
+```
+
+## `Adagrad8bit`
+
+The `Adagrad8bit` class is specifically tailored for 8-bit optimization, inheriting from `Optimizer1State`. It is designed for models where memory efficiency is crucial and it operates with reduced precision to save memory and increase computation speed.
+
+### `Adagrad8bit` Usage:
+
+```python
+import torch
+from bitsandbytes.optim import Adagrad8bit
+
+model = YourModel()
+
+# Initialize the optimizer with your model's parameters
+optimizer = Adagrad8bit(params, lr=0.01)
+
+# In your training loop
+optimizer.zero_grad()
+loss = compute_loss()  # Implement your loss computation
+loss.backward()
+optimizer.step()
+```
+
+## Adagrad32bit
+
+The `Adagrad32bit` class is similar to `Adagrad` but ensures that all computations are carried out with 32-bit precision. This class is preferable when numerical precision is more critical than memory efficiency.
+
+### Adagrad32bit Usage:
+
+```python
+import torch
+from bitsandbytes.optim import Adagrad32bit
+
+model = YourModel()
+params = model.parameters()
+
+# Initialize the optimizer with your model's parameters
+optimizer = Adagrad32bit(params, lr=0.01)
+
+# In your training loop
+optimizer.zero_grad()
+loss = compute_loss()  # Implement your loss computation
+loss.backward()
+optimizer.step()
+```
diff --git a/docs/source/quantization.mdx b/docs/source/quantization.mdx
@@ -0,0 +1 @@
+Linear8bitLt & Linear4bit