Skip to content

Commit

Permalink
Add additional sections, first optimizers, MacOS WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
Titus-von-Koeller committed Feb 1, 2024
1 parent fbc0385 commit 84b5fc0
Show file tree
Hide file tree
Showing 6 changed files with 133 additions and 8 deletions.
16 changes: 12 additions & 4 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
- sections:
- local: index
title: Bits & Bytes
- title: Get started
sections:
- local: introduction
title: Introduction
- local: quickstart
title: Quickstart
- local: installation
title: Installation
title: Get started
- title: Features & Integrations
sections:
- local: quantization
title: Quantization
- local: optimizers
title: Optimizers
- local: integrations
title: Integrations
8 changes: 8 additions & 0 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Note currently `bitsandbytes` is only supported on CUDA GPU hardwares, support f

<hfoptions id="OS system">
<hfoption id="Linux">
<hfoption id="MacOS">

## Linux

Expand Down Expand Up @@ -39,5 +40,12 @@ python -m build --wheel

Big thanks to [wkpark](https://github.com/wkpark), [Jamezo97](https://github.com/Jamezo97), [rickardp](https://github.com/rickardp), [akx](https://github.com/akx) for their amazing contributions to make bitsandbytes compatible with Windows.

</hfoption>
<hfoption id="Windows">

## MacOS

Mac support is still a work in progress.

</hfoption>
</hfoptions>
5 changes: 5 additions & 0 deletions docs/source/integrations.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Transformers

# PEFT

# Trainer for the optimizers
8 changes: 4 additions & 4 deletions docs/source/index.mdx → docs/source/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# bitsandbytes
# `bitsandbytes`

The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
The `bitsandbytes` library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.

There are ongoing efforts to support further hardware backends, i.e. Intel CPU + GPU, AMD GPU, Apple Silicon. Windows support is on its way as well.


Resources:
# Resources:
- [8-bit Optimizer Paper](https://arxiv.org/abs/2110.02861) -- [Video](https://www.youtube.com/watch?v=IxrlHAJtqKE) -- [Docs](https://bitsandbytes.readthedocs.io/en/latest/)

- [LLM.int8() Paper](https://arxiv.org/abs/2208.07339) -- [LLM.int8() Software Blog Post](https://huggingface.co/blog/hf-bitsandbytes-integration) -- [LLM.int8() Emergent Features Blog Post](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/)
Expand Down
103 changes: 103 additions & 0 deletions docs/source/optimizers.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
Here we provide a short description and usage examples for each optimizer in `bitsandbytes.optim. We'll start by explaining the core optimizer class `Optimizer8bit`, followed by the specific implementations `Adagrad`, `Adagrad8bit` and `Adagrad32bit`.

Each of these optimizers can be utilized depending on the specific requirements of the task at hand, such as memory constraints, computational efficiency and the need for precision.

# Optimizer base class

## `Optimizer8bit`

The `Optimizer8bit` class serves as a base class for all 8-bit optimizers, providing common functionalities required for quantized optimization. The class is designed to support both 32-bit and 8-bit computations, where 8-bit optimizations can significantly reduce memory footprint and increase computation speed.

### Usage:

```python
import torch
from bitsandbytes.optim import Optimizer8bit

model = YourModel()
params = model.parameters()

# Initialize the optimizer with your model's parameters
optimizer = Optimizer8bit(params, defaults={
'lr': 0.001,
'betas': (0.9, 0.999),
'eps': 1e-08,
'weight_decay': 0
}, optim_bits=8) # Use optim_bits=32 for 32-bit optimization

# In your training loop
optimizer.zero_grad()
loss = compute_loss() # Implement your loss computation
loss.backward()
optimizer.step()
```

# Adagrad implementations

## `Adagrad`

The `Adagrad` class is an implementation of the Adagrad optimizer, which adapts the learning rate for each parameter based on the historical gradient information. This version allows for both 32-bit and 8-bit representations, with specific classes for each.

### `Adagrad` Usage:

```python
import torch
from bitsandbytes.optim import Adagrad

model = YourModel()
params = model.parameters()

# Initialize the optimizer with your model's parameters
optimizer = Adagrad(params, lr=0.01)

# In your training loop
optimizer.zero_grad()
loss = compute_loss() # Implement your loss computation
loss.backward()
optimizer.step()
```

## `Adagrad8bit`

The `Adagrad8bit` class is specifically tailored for 8-bit optimization, inheriting from `Optimizer1State`. It is designed for models where memory efficiency is crucial and it operates with reduced precision to save memory and increase computation speed.

### `Adagrad8bit` Usage:

```python
import torch
from bitsandbytes.optim import Adagrad8bit

model = YourModel()

# Initialize the optimizer with your model's parameters
optimizer = Adagrad8bit(params, lr=0.01)

# In your training loop
optimizer.zero_grad()
loss = compute_loss() # Implement your loss computation
loss.backward()
optimizer.step()
```

## Adagrad32bit

The `Adagrad32bit` class is similar to `Adagrad` but ensures that all computations are carried out with 32-bit precision. This class is preferable when numerical precision is more critical than memory efficiency.

### Adagrad32bit Usage:

```python
import torch
from bitsandbytes.optim import Adagrad32bit

model = YourModel()
params = model.parameters()

# Initialize the optimizer with your model's parameters
optimizer = Adagrad32bit(params, lr=0.01)

# In your training loop
optimizer.zero_grad()
loss = compute_loss() # Implement your loss computation
loss.backward()
optimizer.step()
```
1 change: 1 addition & 0 deletions docs/source/quantization.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Linear8bitLt & Linear4bit

0 comments on commit 84b5fc0

Please sign in to comment.