A micro GPT implementation and training pipeline in PyTorch.
from microgpt.model import (
load_model,
PretrainedModelConfig,
)
model = await load_model(
config=PretrainedModelConfig(),
)
generated_text = model.generate_text(
text="Hi, I'm a language model,",
max_new_tokens=50,
)
The pretrained models can be found in the pretrained directory. It was trained in 2 stages:
- Stage 1: Training using large amounts of mostly web based data
- Stage 2: Training using 3 runs of smaller amounts of high quality data and combining/souping the model weights
- 8x H200 SXM GPUs (80GB) on runpod.io
- Time taken: ~4 hours
- Hourly Cost: $32 per hour
- Total cost: ~$128
- 1 c8g.4xlarge instance on AWS
- Time taken: ~16 hours
- Hourly Cost: $0.43184 per hour
- Total cost: ~$6.75
- Tokenizer
- Loading pretrained gpt tokenizers
- Training custom byte-pair encoding tokenizers
- Loading custom byte-pair encoding tokenizers from files
- Micro GPT model implementation
- Loading pretrained gpt models
- Training custom gpt models with support for DDP
- Training checkpoints
- Loading custom gpt models from files
- Training using text, files, urls or huggingface datasets
- RoPE implementation
- Reproducing GPT-2 with a custom tokenizer and model
- HellaSwag eval
- 2 stage model training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data
- Supervised finetuning
- Reinforcement learning
uv venv --python 3.12
source .venv/bin/activate
- Install dependencies
make sync
- Go through the notebooks to understand how to use the library.
This project is licensed under the MIT License. See the LICENSE file for details.