Skip to content

A micro GPT implementation and training pipeline in PyTorch

License

Notifications You must be signed in to change notification settings

gpahal/microgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

20bc6f7 · Mar 31, 2025

History

3 Commits
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 31, 2025
Mar 30, 2025
Mar 30, 2025

Repository files navigation

microgpt

A micro GPT implementation and training pipeline in PyTorch.

from microgpt.model import (
    load_model,
    PretrainedModelConfig,
)

model = await load_model(
    config=PretrainedModelConfig(),
)
generated_text = model.generate_text(
    text="Hi, I'm a language model,",
    max_new_tokens=50,
)

Pretrained model

The pretrained models can be found in the pretrained directory. It was trained in 2 stages:

  1. Stage 1: Training using large amounts of mostly web based data
  2. Stage 2: Training using 3 runs of smaller amounts of high quality data and combining/souping the model weights

Comparison with OpenAI's GPT-2

Loss and eval

Infrastructure used for training

  • 8x H200 SXM GPUs (80GB) on runpod.io
    • Time taken: ~4 hours
    • Hourly Cost: $32 per hour
    • Total cost: ~$128
  • 1 c8g.4xlarge instance on AWS
    • Time taken: ~16 hours
    • Hourly Cost: $0.43184 per hour
    • Total cost: ~$6.75

Features

  • Tokenizer
    • Loading pretrained gpt tokenizers
    • Training custom byte-pair encoding tokenizers
    • Loading custom byte-pair encoding tokenizers from files
  • Micro GPT model implementation
    • Loading pretrained gpt models
    • Training custom gpt models with support for DDP
    • Training checkpoints
    • Loading custom gpt models from files
  • Training using text, files, urls or huggingface datasets
  • RoPE implementation
  • Reproducing GPT-2 with a custom tokenizer and model
  • HellaSwag eval
  • 2 stage model training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data
  • Supervised finetuning
  • Reinforcement learning

Usage

  • Install uv

  • Install make

  • Setup a virtual environment

uv venv --python 3.12
source .venv/bin/activate
  • Install dependencies
make sync
  • Go through the notebooks to understand how to use the library.

Acknowledgements

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A micro GPT implementation and training pipeline in PyTorch

Topics

Resources

License

Stars

Watchers

Forks