BackGPT & BackChat

This is an experiment built on a fork of smol-gpt to train a 'previous word/token' type gpt text generation instead of 'next word/token'.

Training Plan

Phase 1: Pre-training on Fineweb 100BT

We use the Fineweb 100BT sample for pre-training our base model.

Prepare Dataset

# This will:
# 1. Download Fineweb 100BT sample from HuggingFace
# 2. Train tokenizer (vocab size 8888)
# 3. Preprocess and tokenize the data
python preprocess_xcoax.py --vocab-size 8888 --num-chunks 1000

Train Model

# Train on Fineweb 100BT
python train_xcoax.py

Sample from Base Model

python sample_xcoax.py

Phase 2: Instruction Tuning

After pre-training, we fine-tune the model on Open Instruct V1 to create BackChat, an instruction-following model that works in reverse - given a response, it generates the instruction that could have led to that response.

Prepare Instruction Dataset

# Process and tokenize the instruction dataset
python preprocess_instruct.py --vocab-size 8888

Finetune Model

# Finetune the pre-trained model on instruction data
python finetune_xcoax.py --model-path out/xcoax/best_checkpoint.pt

Model Architecture

XCOAX Model (Pre-trained)

8888 token vocabulary
16 attention heads
12-layer transformer
1024 embedding dimension
Training hyperparameters:
- Batch size: 64
- Gradient accumulation steps: 4
- Learning rate: 3e-4 with cosine decay
- Block size: 1024
- Mixed precision: bfloat16

BackChat Model (Instruction-tuned)

The instruction-tuned model maintains the same architecture as the base model but is fine-tuned on the Open Instruct V1 dataset in a unique way:

Given a response, it generates the instruction that could have led to that response
Both response and instruction are processed backwards (word by word)
Uses special tokens to mark response and instruction sections
Dataset includes:
- 51,759 samples from Alpaca
- 82,599 samples from Self Instruct
- 18,194 samples from GPT-4 Instruct
- And more instruction-following data

Usage Examples

Base Model (Pre-trained)

# Interactive sampling with adjustable parameters
python sample_xcoax.py

# Parameters:
# - temp=X: Set temperature (default 0.8)
# - top_k=X: Set top-k sampling (default 200)
# - tokens=X: Set max tokens to generate (default 500)

BackChat (Instruction-tuned)

# Interactive sampling - provide a response, get an instruction
python sample_xcoax_instruct.py

Example interaction:
Response: The cat is sleeping.
Generated Instruction: What is the cat doing?

Response: Python is a high-level programming language.
Generated Instruction: Define what Python is.

# Parameters same as base model

How It Works

Backwards Instruction Format

# Original data:
instruction = "Identity the odd one out."
input_text = "Twitter, Instagram, Telegram"
output = "Telegram"

# Training format:
<|im_start|><|response|>Telegram<|im_end|>
<|im_start|><|instruction|>Twitter Instagram, Telegram out odd the Identity<|im_end|>

# During inference:
1. User provides response
2. We reverse it: "sleeping is cat The"
3. Format: <|im_start|><|response|>sleeping is cat The<|im_end|>
4. Model generates reversed instruction
5. We un-reverse the instruction for display

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
assets		assets
static		static
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
LLMA.jpg		LLMA.jpg
README.md		README.md
config.py		config.py
config_fine10.py		config_fine10.py
config_trained_backup_tinystories.py		config_trained_backup_tinystories.py
config_xcoax.py		config_xcoax.py
dataset.py		dataset.py
finetune_xcoax.py		finetune_xcoax.py
instruct_tune.py		instruct_tune.py
main.py		main.py
model.py		model.py
paper_notes.md		paper_notes.md
preprocess.py		preprocess.py
preprocess_chat.py		preprocess_chat.py
preprocess_fine10.py		preprocess_fine10.py
preprocess_instruct.py		preprocess_instruct.py
preprocess_xcoax.py		preprocess_xcoax.py
pyproject.toml		pyproject.toml
sample.py		sample.py
sample_xcoax.py		sample_xcoax.py
sample_xcoax_instruct.py		sample_xcoax_instruct.py
server.py		server.py
server_chat.py		server_chat.py
server_fine.py		server_fine.py
tokenizer.py		tokenizer.py
train.py		train.py
train.sh		train.sh
train_chat.py		train_chat.py
train_fine10.py		train_fine10.py
train_xcoax.py		train_xcoax.py
training.jpg		training.jpg
uv.lock		uv.lock
xcoax_backchat.bib		xcoax_backchat.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BackGPT & BackChat

Training Plan

Phase 1: Pre-training on Fineweb 100BT

Phase 2: Instruction Tuning

Model Architecture

XCOAX Model (Pre-trained)

BackChat Model (Instruction-tuned)

Usage Examples

Base Model (Pre-trained)

BackChat (Instruction-tuned)

How It Works

Backwards Instruction Format

Training Progress

About

Releases

Packages

Languages

License

isaac-art/smolGPT_back

Folders and files

Latest commit

History

Repository files navigation

BackGPT & BackChat

Training Plan

Phase 1: Pre-training on Fineweb 100BT

Phase 2: Instruction Tuning

Model Architecture

XCOAX Model (Pre-trained)

BackChat Model (Instruction-tuned)

Usage Examples

Base Model (Pre-trained)

BackChat (Instruction-tuned)

How It Works

Backwards Instruction Format

Training Progress

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages